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Preface 


Then I saw that all toil and all skill in work come from a man’s envy of his neighbour. 
This also is vanity and a striving after wind. 


Ecclesiastes, 4:4 


Teacher: may you be eager to make your students understand quickly what has cost 
you hours of study to see clearly. 


J. Escriva, Furrow, 229 


This is a monograph on Bell nonlocality. If you are reading it, you probably have already 
got an idea of what it is all about—at any rate, all the motivation and introduction have 
been packed in chapter 1 for the sake of its self-consistency, so I won’t repeat them here. 
Let me just explain what this book tries to be, and what it is certainly not. 

This book has been written with the clear awareness that nonlocality is a phenomenon 
i.e., a fact of observation. It is neither speculation, nor a notion mediated by a theory: 
It can be defined in its own right, and tested in its own right. The fact that quantum 
theory predicts those observations is a remarkable feat for that theory. The fact that the 
idea was arrived at by purely foundational thinking, without any application in mind, is 
an encouragement not to fall into strict pragmatism. But, beyond historical avatars and 
our current theories, the phenomenon is here to stay—just like apples will keep falling, 
unaware of Newton and Einstein, indifferent to whether we attribute their trajectory to 
a force or a curvature. 

Because of this, I have opted for presenting the phenomenon from our current 
perspective: The reader won’t find here a presentation of the historical debates or 
the exegesis of the masters’ writings. The book focuses on the concepts and on the 
mathematical description of nonlocality, both per se and in the context of quantum theory. 
It is structured as follows: 

Part I contains the core material (although chapters 4 and 5 could be just skimmed 
through in a first reading). 

Part II shows the applied side! of nonlocality: “Device-independent” certification of 
devices in the context of quantum technologies. 

Part III extends towards more foundational matters related to nonlocality. 

The Appendices expand on topics that are important but would have clogged the 
flow of the main text. 


1 Which must not be read as “the dark side”: If that were my stance, I would not have devoted so much of 
my research and a whole part of this book to it. 


viii Preface 


A famous old joke says that Italians write fifty-page pamphlets entitled Tutto su, while 
German multi-volume treatises are called Einführung zu. As an Italian academically 
formed in Switzerland,” I have tried to hit somewhere in the middle. In particular: 


— Parts II and III were written keeping in mind that this is a book on nonlocality. 
Thus, part II does not try to be a manual of device-independent certification (soon, 
one will be needed). Similarly, part III is not to be taken as an abrégé of foundations 
of physics: Even staying only in the neighborhood of discussions about quantum 
theory, very important topics have not been mentioned. 


— Experimental evidence is crucial to science, and evidence for Bell nonlocality 
is abundant, as unquestionable as for any other physical phenomenon. I have 
scattered some citations to few important milestones, but there is no chapter 
devoted to experiments, only a very sketchy appendix. For challenges in the lab, 
others are more likely than myself to provide relevant insight. 


— Some topics have been left out, which could have been fitted here or there: 
Monogamy of nonlocality, Grothendieck constant, and inequalities with unbou- 
nded violation, random choice of measurement settings ... References about these 
and other missing topics can be found in a review article that I was honored to 
co-author [Brunner, Cavalcanti, Pironio, Scarani, and Wehner (2014)]. 


Since we are speaking of a review article: This book is not one. I have made a conscious 
effort to give credit where it is due, notably when attributing priorities; but also, to keep 
the list of references down to a reasonable length. I have written under the assumption 
that the internet exists, and that starting from a few relevant references the reader will be 
quickly able to access related works. 


2 I know that Switzerland is not Germany, and besides I was based in Suisse romande—but at least I was 
North of the Alps for quite a while. 
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Part | 


Classic Bell Nonlocality 


This first part deals with Bell nonlocality per se, following a synthetic rather than a 
historical outline. The first two chapters deal with the definition of Bell nonlocality, the 
conditions under which it can be demonstrated, and its mathematical formalization. The 
description of the phenomenon within quantum theory is then introduced, leading to a 
few general results and a large number of examples. 


First Encounter with Bell Nonlocality 


Muisunderstandings of Bells theorem happen so fast that they violate locality. 
R. Munroe, XKCD 


This chapter serves both as an introduction to the book, and as a self-contained first 
presentation of Bell nonlocality. 


1.1 Three Roles for Bell Nonlocality 


Few scientific statements are more radical than one of the core tenets of quantum physics: 
There is indeterminacy in nature. It has accompanied quantum theory since its earliest 
moments: Only a few months after Heisenberg and Schrédinger independently defined 
the definitive formalism, Max Born suggested that the laws of the new theory should be 
seen as intrinsically statistical. This was to become the orthodox view. Sensing the danger, 
Einstein quickly wrote to Born his conviction that a theory with statistical laws could only 
be a temporary fix, and that determinism should ultimately be recovered. The debate 
continued for decades with a few flares, notably the celebrated EPR paper (Einstein 
Podolsky, and Rosen, 1935) and Bohr’s immediate reply, but in an atmosphere of overall 
indifference among physicists at large. In those years, the excitement about quantum 
theory was not found in debating its meaning, but in its almost boundless predictive 
power. It has become commonplace to refer to the attitude of those years by Mermin’s 
dictum “shut up and calculate.” 

Ultimately, the statistical language became the standard to which generations of 
physicists conformed out of inertia. If asked for evidence of indeterminacy, still today 
many would refer to Heisenberg’s uncertainty relations, that however can only voice for 
indeterminacy in quantum theory, not in nature (see Appendix A.1). This is surprising 
because direct evidence has been compelling since 1964, thanks to the work of John 
Bell (Bell, 1964). He showed that the possibility of recovering a deterministic model is 
amenable to experimental falsification, through the observation of a phenomenon that 
we shall call Bell nonlocality. In a first approach, Bell’s argument is mathematically simple 
(see sections 1.3—1.4); because of its importance, it has been submitted to a thorough 
scrutiny, from which it has emerged unscathed and actually strengthened by more solid 
foundations (see section 1.5 and chapter 2). 
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In 1964 there was already a huge amount of experimental evidence supporting the 
validity of quantum theory. Nevertheless, none of those data could be used to check 
Bell’s criterion. Dedicated experiments had to be designed. The work of Alain Aspect 
and coworkers is credited as the first conclusive evidence of Bell nonlocality (Aspect 
et al. 19826; Aspect et al. 1982a). The evidence has been steadily growing since then; 
eventually, three independent experiments reported in 2015 are considered definitive 
(Hensen et al., 2015; Giustina et al., 2015; Shalm et al., 2015). The main text of this 
book does not describe experiments; to facilitate reading the experimental literature, a 
quick guide is provided as Appendix B. 

Discovered thanks to quantum theory, indeterminacy has been vindicated as a 
physical fact, independent of the theory itself. It can be circumvented only at the price 
of adopting even more radical postures about physics and nature themselves (see section 
1.6). This direct vindication of indeterminism is the original motivation of Bell nonlocality. 
For a few decades, it was held to be its sole role too. Those who, for various reasons, were 
already won to the indeterministic cause had taken note of it and moved on. A series of 
works that started around 2005 have uncovered a second role: Bell nonlocahty provides 
the most compelling certification of the correct functioning of some quantum devices, like those 
required to perform quantum cryptography and quantum computation. The fabrication 
of these devices and the development of certification tools based on nonlocality still 
constitute technical challenges, but we’ll have to get there. Far from being an exercise 
in scientific archaeology, this book contains material that future quantum engineers will 
have to master—zn nuce at least, this is a treatise in applied physics. 

Finally, as a phenomenon independent of quantum theory, Bell nonlocality is not 
merely an instrument for a negative task (falsifying determinism): It has a right to 
citizenship in physics. As its third role, Bell nonlocality can be used as a principle constraining 
possible candidates for physical theories. Barring a few pioneering insights, this approach 
was also started after the year 2000. It has already contributed several new ideas and 
notions to the field of foundations of physics but is still very open to future developments. 

These three roles of Bell nonlocality—evidence for indeterminism, certification tool 
for devices, and guideline for foundations—correspond to the three parts into which this 
book is divided, but pervade the whole text. With them in mind, we can enter the core 
of the subject. 


1.2 Introducing Bell Nonlocality 


1.2.1 Setting Bell tests: Laboratories and games 


Tests of Bell nonlocality, or Bell tests for short, are currently experimental setups in physics 
laboratories. For some years, theorists have rather chosen to present Bell tests as games 
that bear some analogy with TV quizzes, polls, exams, judicial trials, and other familiar 
situations.! The game setting is definitely better to bring up the essence of nonlocality, 


1 The reader may come back to this list after becoming familiar with Bell test, to find analogies and 
differences. For instance, in exams, the content of the answer matters, and the verifier will evaluate the 
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Figure 1.1 Sketch of a Bell test for two players (the generalization to more players is straight-forward) 
in the laboratory setting (top) and in the game setting (bottom). After having agreed on a process for 
that round, each player receives an input and has to provide an output; the data of several rounds are 
then sorted to establish the correlations between the outputs a and b for any pair (x, y) of inputs. In the 
laboratory setting, we give in grey the usual representation of the process in quantum theory: A quantum 
state p 1s prepared and some measurements are chosen; which measurement ts actually performed in 
each round is determined by the input. None of this enters the definition of Bell nonlocality. In the game 
setting, we introduce a verifier V that queries the players Alice and Bob and collects their answers. 


and we shall mostly follow it in this book. Nonetheless, as we shall also see, many 
important discussions cannot be fully appreciated without going back to the lab. The 
two settings are sketched and compared in Figure 1.1. 

Ina Bell test game, the players, referred to alphabetically as Alice, Bob, Charlie, etc., are 
all on the same team. The game consists of many rounds. In each round, the players will be 
separated: Each will receive a query (input) and will have to provide an answer (output). 
It is useful to think of a verifier distributing the inputs? and collecting the outputs. 

The rules of the game and the list of possible queries are known in advance. The 
players are allowed to prepare a common strategy before the game, which consists in 
deciding which process they will use in each round of the game. We shall also speak of 
the resources that are used in these processes. If the players were allowed to communicate 
among each other during the game, they would actually not be separated and could easily 
win any game of this type: The most powerful resources are signaling ones. The case 
is more interesting with no-signaling resources. The most elementary example of a no- 
signaling resource is a list of pre-determined outputs, one for each possible input (that 
is, the process consists in producing the output by reading the list). It is no-signaling 


performance of each player without any concern for correlations. In judicial trials, the players’ goal is to provide a 
consistent version of the story (which may not be the truth), but they don’t know in advance the set of questions 
that they may be asked; etc. 


2 In actual experiments, the inputs are usually generated at each player’s location by a “random number 
generator”: The image of the verifier allows us to postpone the delicate discussion about randomness and its 
generation with physical means (subsection 1.5.3). 
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because, if Alice does something to her list, the other players obviously won’t notice 
anything. In other words, Alice can’t send a message to others by manipulating her list. 
For instance, consider three games, each defined by one of the following rules: 


(i) The players must produce the same answer if they receive the same query. 
Gi) The players must produce the same answer if and only if they receive the same 
query. 
ait) The players must produce different answers if both receive query “1,” the same 
answer otherwise. 


A game based on rule (i) is trivially won by the players agreeing on a fixed common 
output. A game based on rule number (ii) can be similarly won by agreeing on a pre- 
determined output for each input, provided that the number of inputs is not larger than 
the number of outputs. If there were more inputs than outputs, the game cannot be 
won with a list of pre-determined answers. Finally, no strategy based on pre-determined 
answers can win a game based on rule (iii). 


1.2.2 The definition of Bell nonlocality 


Bell locality means that the process by which each player generates the output does not take 
into account the other player’s input. In other words, all correlations between the players’ 
outputs is due to the shared resource, on whose nature no assumption is made: It can be 
anything, from a list of numbers on a piece of paper to two jointly programmed quantum 
computers. When Bell locality does not hold we speak of Bell nonlocality. 

This notion of locality can be formalized as follows. Denote by à the process. It 
does not need to be deterministic, so we can say that Alice generates a by sampling 
from a probability distribution P, (a|x). What is crucial is that this does not take Bob’s 
input y into account. Similarly, Bob generates b locally by sampling from a probability 
distribution P, (b|y). If this is the case, the statistics observed by the verifier (who is not 
privy to 4) will be described by 


P(a,b|x,y) = f nowr, (alx) P, (bly) (1.1) 


where Q(A) is the probability distribution that describes the strategy, i.e., how often 
a specific process A is used. Bell locality is clearly a restriction: Not all conceivable 
Pa, b|x, y) can be written in this form. The extreme counterexample is a strategy that 
wins the game in which each player is supposed to output the other player’s input. Also, 
any winning strategy for the game based on rule (ili) requires one of the players to sample 
from a distribution that depends on the other player’s input. 

Statistics will be called Jocal if they can be written in the form (1.1), nonlocal if 
they cannot. A Bell test is a game whose winning strategy is described by nonlocal 
statistics. 
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1.2.3 On resources and semantics 


If nonlocal statistics are observed, the verifier knows that the players have shared a 
nonlocal resource. If local statistics are observed, we can’t say much about the resource: 
The players might have shared a potentially nonlocal one but have used it poorly. This 
sounds like elementary logic, but it triggers two crucial remarks: 


e The definition of Bell locality does not rely on a prior characterization of the 
class of “local resources”; even less one needs to assume that there exist in nature 
resources that are intrinsically local in this sense.* The opposite is the case: From the 
definition of Bell locality, that stands its ground, one can define “local resources” as 
hypothetical resources that could only lead to local statistics. The traditional name 
for such local resources is “local hidden variables” (LHVs) or simply “local variables” 
(LVs), the word “hidden” being a relic of the discussions on quantum theory. 


e We'll see in chapter 2 that every local statistic (1.1) can even be realized with a 
strategy based on pre-determined outputs. This result, known as Fine’s theorem, 
is the basis for the mathematical tools of the field. But we cannot infer from 
this theorem that all observed local behaviors are actually generated with pre- 
determined outputs, nor that the definition of Bell locality assumes determinism. 


Now, were it not for quantum theory, the definition of Bell nonlocality would sound 
both uninteresting and uncontroversial: Nonlocal resources would be communication 
devices. Quantum theory* however forces us to enlarge, at least in principle, the list of 
possible nonlocal resources. Let us then consider that the players share physical systems 
in a state that quantum theory describes as pp, and let’s assume that the process that 
produces the outputs is performing local measurements on this state. Specifically, upon 
receiving her input x, Alice performs a measurement on her system, with the output a 
of that measurement is associated to the positive operator I}. Bob acts similarly. After 
several rounds, all played with this process, quantum theory predicts that the statistics 
collected by the verifier are given by 


P(a,b|x,y) = Tr(I1% ® Ty pap). (1.2) 


In general, these statistics cannot be cast in the form (1.1): This is the content of Bell’s 
theorem (Bell, 1964). Explicit examples will fill this book, but for the time being let us 
accept that some shared quantum states are nonlocal resources. However, it is also well 


3 In the same vein, notwithstanding the frequent replacement of “local” with “classical” in the field’s jargon, 
the definition of Bell nonlocality does not rely on a definition of classicality, and even less on assuming the 
existence of intrinsically classical physical systems. Overall, we shall avoid speaking of classical/quantum systems 
or phenomena. It is correct to speak of classical theory and quantum theory, because these are well-defined (see 
Appendix C.1.1 for the essentials). It is also customary to speak of classical/quantum information to refer to the 
resources, insofar as described within each theory, but we won’t do it. 

4 Familiarity with elementary quantum theory is given for granted in this book; more advanced topics and 
specific aspects of quantum information theory are summarized in Appendix C. 
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known? that shared quantum states are not communication channels: By acting only on 
her system, Alice cannot learn anything about what Bob has done with his—he could 
have measured it, kept it, discarded it, and Alice does not see any change in her statistics. 
In this sense, quantum states are no-signaling resources just as shared lists of numbers. Bell 
nonlocality 1s interesting and intriguing because it can be demonstrated by sharing no-signaling 
resources. 

Or can it? Famously (or notoriously), quantum theory does not provide any recipe for 
the generation of each round’s output. Would it be possible that what quantum theory 
describes as no-signaling resources are actually signaling ones? Einstein dubbed this 
possibility “spooky action at a distance”. As we shall see in section 1.6, it is one possible 
interpretation. Among those who oppose it, some think that the wording “nonlocality” 
evokes too closely this unwelcome interpretation. What we called “locality” in subsection 
1.2.2, they’d rather call local realism or local causality. These are elegant expressions with 
philosophical appeal: They remind us that we are not merely dealing with operations 
and observations, but with a prejudice in our Weltanschauung that has been shattered. 
However, they are also not exempt from the danger of being over-interpreted.® 

With all their potential limitations, the wordings “nonlocality,” “local realism,’ and 
“local (hidden) variables” have already enjoyed a few decades of tradition and are most 
probably here to stay. I hope I have said enough to prevent their misuse, and I shall use 
them freely. As for interpretations, we shall return to them in section 1.6. 


1.3. My First Bell Test: Clauser-Horne-Shimony-Holt (CHSH) 


To put these general considerations on concrete grounds, we proceed to describe some 
specific examples of Bell tests. For this introductory chapter, I have chosen to present 
five classic examples: One in this section and four in the next; several others will be 
presented later in the book. An elementary proof that each is indeed a Bell test is given, 
exploiting Fine’s theorem (subsection 2.3.3) that allows considering only strategies based 
on pre-established answers. The relevance of each test for the certification of quantum 
entanglement is merely stated, leaving all the calculations for chapters 3-5. 


5 Even if this should be elementary knowledge, given the centrality of the claim for the content of this book, 
the explicit proof is given in Appendix C.1.3. 

6 It has become commonplace to split the prejudice of “local realism/causality” into two separate prejudices, 
“locality” and “realism” (or “causality”). To be at peace with the fact of a violation, it would then be enough 
to abandon either. Abandoning locality may legitimately mean signaling: In (1.1), one would have P(a|x, y, A), 
P(b|x, y, A), or both; and this modification is indeed sufficient to generate Bell nonlocality. But the meaning 
of abandoning realism/causality is by far less clear (Norsen, 2007; Gisin, 2012). It cannot mean “abandoning 
determinism”: Determinism is not assumed in (1.1), so abandoning it does not generate Bell nonlocality. 
Neither should it mean “abandoning any connection with reality,” reducing physics to unfounded speculations: 
If we abandon “local realism” it’s because we accept the verdict of observation. Probably, “abandoning 
realism/causality” is a way of saying that only statistics are speakable, see subsection 1.6.3. But then, this 
alternative is at a different level than signaling: It is not a mechanism, but the statement that no mechanism 
should be looked for. 
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It should be obvious that a Bell test requires at least two players, otherwise there is 
no notion of locality. For each player, there must be at least two possible inputs: If some 
players could only receive one query, those inputs would be known to the other players. 
Finally, for each input, there must be at least two possible values for the output. We start 
with this simplest scenario. 

The inputs of Alice are labeled x € {0, 1}, her outputs a, € {—1, +1} (labeling is of 
course arbitrary, this choice is convenient for the calculation to come). The inputs of 
Bob are labeled y € {0, 1}, his outputs b, € {—1, +1}. 

The rule of the game prescribes that Alice and Bob should aim at giving the same 
answer whenever (x, y) € (0, 0), (0, 1), (1, 0), but opposite answers when (x, y) = (1, 1). 
We consider the score 


S = (aobo) + (aob1) + (a1b0) — (a161) (1.3) 


where the average is taken over an arbitrarily large number of rounds. The maximal score 
is obviously S = 4. 

To prove that this game is a Bell test, we need to find what score can be achieved 
with local resources. Invoking Fine’s theorem, it is enough to see what happens when 
Alice and Bob have shared a pre-determined quadruple (ao, a1; bo, b1) in each round. 
The existence of these four numbers entails the existence of a well-defined value for the 
derived quantity s = aobo + agb, + aibo — aıb1. Since the average of a sum is the sum 
of the averages, S = (s) holds. Now, for every quadruple, either s = +2 or s = —2. This 
is readily seen by rewriting s = ag(bo + b1) + a1 (bo — b1): Indeed, if bọ = b1 the second 
term is zero, if bọ = —b, the first term is zero. In Table 1.1 we list explicitly all sixteen 
possibilities: Eight of them give s = +2 and the other eight s = —2. Notice how three out 
of four pairs of inputs contribute in the same way, but the last pair pulls the sum down 
(or up if it was negative). This observation will become handy later. 

In each round, the verifier sees only the pair (ax, by) corresponding to the inputs (x, y) 
he has sent, so he cannot estimate s. However, by performing statistics conditional on 
each pair of inputs, he can estimate the four (a,b,) and obtain S. Now, if S = (s) and 
each instance of s can only take the values + 2, it follows that 


LV 
|S| < 2. (1.4) 


Suppose now that the verifier finds S > 2: He will have to admit that the players were not 
sharing pre-established quadruples, nor any resource that can be simulated with them— 
in other words, he will have to admit that they share a nonlocal resource. 

This Bell test is called CHSH from (Clauser, Horne, Shimony, and Holt, 1969). 
Notice how the mathematical expression of the test is an inequality, here Eq. (1.4). The 
players will convince the verifier that they have a nonlocal resource if they manage to vro- 
late the inequality. John Bell’s original inequality (Bell, 1964) is basically the same as (1.4), 
but derived under the assumption that one of the (a,by) is exactly equal to 1 (Appendix 
A.4). Since perfect correlations can be predicted but cannot be observed, that inequality 
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Table 1.1 All sixteen quadruples of pre-established values, and derivative quantities, 
for the CHSH test. In boldface, the term that pulls the sum s in the opposite direction 
as the other three. Notice that the second half of the table is the mirror image of the 
first half, since flipping all the signs does not change the products. 


ao, a1; bo, b1 aobo aobı a1bo abı s 

+1, +1; +1, +1 +1 +1 +1 +1 +2 
+1, +1; +1, —1 $i zï +i žij +2 
+1, +1; —1, +1 -1 +1 -1 +1 —2 
Fitti z —1 >j —1 -2 
+1,—1; +1, +1 +1 +1 —1 —1 +2 
+1513: 1;.—1 +1 —1 -1 +1 —2 
+1,—1;—1,+1 -1 +1 +1 —1 +2 
+1, —1; —1,—1 ži =| +1 +1 =) 
—1, +1; +1,4+1 -1 —1 +1 +1 —2 
—1, +1; +1,—1 =i +1 +1 =] +2 
—1, +1; —1, +1 +1 —1 -1 +1 —2 
—1, +1; —1,—1 +i +1 -1 —1 +2 
Srl iF] =i sj Sj =i 23 
ss ees i= | +1 =j +1 oy) 
=1;=1;=1; +1 +1 -1 +1 —1 +2 
E gs E at +1 +1 +1 $2 


is sufficient to prove that quantum theory predicts nonlocal resources but is untestable in 
experiments. 

The CHSH test is the workhorse of the field, we’ll study it in great detail. Let’s just 
mention that the maximal score S = 4 can be reached of course by signaling,” but also 
with a hypothetical no-signaling resource called a PR-box (Popescu and Rohrlich, 1994) 
that we’ll encounter in chapter 9. However, PR-boxes exist in mathematics but don’t seem 
to exist in nature: With quantum entanglement, the maximal score is S = 2/2 ~ 2.8284. 


7 Though rather obvious, here is one possible way in which the players can score S = 4 with signaling. Alice 
and Bob agree on a bit a = b. Upon being queried, Alice outputs a and sends x to Bob; Bob outputs (—1)*”b. 
Notice that Alice’s answer is pre-determined, but Bob’s is not: He has to wait for x before producing it. So, the 
conclusion that both outputs could not have been pre-determined holds. 


Four more Classic Bell Tests 11 


This value can be achieved by suitable measurements on the maximally entangled state 
of two qubits® 


aa 
J2 


and in fact, in a sense to be made precise in chapter 7, only by that state and those 
measurements. 


|t) = — (10) 10) + |1)|1)); (1.5) 


1.4 Four more Classic Bell Tests 


This section introduces four other examples of Bell tests that should help gaining further 
familiarity with these notions. 


1.4.1 Mermin’s outreach criterion 


In his effort to explain Bell nonlocality in a simple way, David Mermin (1981) conceived 
a Bell test that has become popular. There are two players, each with three inputs 
(x, y € {1, 2, 3}) and two outputs (a, b € {+1, —1}). Let’s assume that the observation 
shows that a; = b; in all cases where the same input was chosen. We are interested in 
O= È xy Plax =b) =3+ ety P(x = by). If the outputs are pre-determined, (a1, a2, 


a3) = (b1, b2, 63) can take eight values, namely (+1, +1, +1), (+1, +1, —1), etc. For 
(+1,+1,+1) and (—1, —1, —1), one finds O = 9; the other six triples give O = 5. Thus, 
certainly O > 5 for local resources. Quantum theory predicts that one can go down to 
O = 4.5. In particular, not even with quantum resources one can win perfectly the game 
based on rule (ii) defined in subsection 1.2.1, because that would correspond to O = 3. 

The inequality O > 5 defines a Bell test only if a; = 0;. If this assumption is removed, 
the inequality may be violated with LVs. For instance, the choice of pre-determined 
outputs (a1, a2, a3) = (+1, +1, +1) and (b1, b2, b3) = (—1, —1, —1) gives O = 0. This is 
the same weakness of Bell’s original criterion, which had to be transformed into CHSH 
to become robust. Robust Bell tests for two players, three inputs and two outputs will be 
discussed in section 4.2. 


1.4.2 Greenberger-Horne-Zeilinger (GHZ) test 


The original nonlocality test by Daniel Greenberger, Michael Horne, and Anton 
Zeilinger (1989) considers four players, but the argument can be made for any number 


8 The notation |0) |O) stand for |0) @|0). Here and in the rest of the book, the tensor product symbol is 
usually implicit when writing quantum states, while it is often explicit when writing operators. I find that this 
is the choice that facilitates the reading. 
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of players larger than two. Nowadays, we refer to the three-players version as the “GHZ 
test” without further qualifiers (Mermin, 1990b); this is the one we present here. As 
in the CHSH case, each player has two inputs and two outputs, those of Charlie being 
labeled z and c. Assume now that the verifier observes the following perfect correlations: 


(agboc1) = (406160) = (aboco) = +1. (1.6) 


Then, 


labia Z +1. (1.7) 


Indeed, (a,bycz) = +1 means that aybycz = +1 in each single round. If the outputs are 
pre-determined, they must be such that aobocı = aob1 co = a1 boco = +1 in each round. By 
multiplying the three conditions and noticing that aa = b2 = er = 1, we get a,b,c) = +1. 
In quantum theory, however, one can have the correlations (1.6) alongside 


(ajb)c) =—1. (1.8) 


This is obtained with suitable measurements on the state 
1 
J/2 


and, just as for CHSH, this quantum realization is unique (see chapter 7). 

As presented, the GHZ test relies on the observation of perfect correlations or anti- 
correlations. To cope with unavoidable imperfect situations, its score can be turned into 
the so-called Mermin inequality: 


|GHZ) = —= (10) 10) 10) +]1)|1) 11)), (1.9) 


LV 
M = (agbocı) + (aobıco) + (a1boco) — (a1b1c1) < 2. (1.10) 


The proof that inequality (1.10) holds is left as Exercise 1.1. Thus, contrary to what 
happens with CHSH, the maximal score M = 4 of the Mermin inequality can in principle 
be attained in quantum theory. We shall complete the study of this inequality and its 
generalizations for more parties in section 5.2. 


1.4.3 Hardy’s test 


At this point, one may ask if one can build a Bell test for two players on extreme 
correlations that quantum resources can in principle achieve. There are indeed such 
examples. The first one was found by Lucien Hardy (1992; 1993). The extreme 
probabilities that are enforced are: 


P(ao= +1,bo = +1)=0 (1.11) 


Four more Classic Bell Tests 13 


Pg == 19d) =+1)=0 (1.12) 


P(ai =+1,69 = —1) = 0. (1.13) 
The following inferences are then obvious: 


e From (1.12): bı = +1 implies aọ = +1 
e From (1.11): a9 = +1 implies bọ = —1 
e From (1.13): bọ = —1 implies a; = —1. 


By enchaining these three inferences we are led to a fourth one, namely “b; = +1 implies 
a, = —1,” and thus in particular to the prediction 


Plai = +16, =+1 £ 0. (1.14) 


Another way of reaching the same conclusion is suggested in Exercise 1.2. 

In subsection 4.4.1, we shall see that in quantum theory one may have the three 
constraints (1.11)-—(1.13) and nonetheless P(ay = +1, bı = +1) > 0 with a maximal 
value of approximately 0.11. This looks absurd, since enchaining the bullet pointed 
three inferences looks innocuous—but it is not: Just as in the derivation of the CHSH 
inequality, the enchaining assumes that one can speak of both ag and a1, and of both bo 
and b1. Rigorously, the first inference should read: If b = +1 was found for y = 1, we 
know that a = +1 will be found 1f the input x = 0 1s called; and similarly for the others. 


1.4.4 The Magic Square 


The rules of the Magic Square test? (Cabello, 20016; Cabello, 2001a; Aravind, 2002) 
are slightly more complex. The test has three possible inputs per player, x, y € {1, 2, 3}; 
the players are asked to output three bits each: a, = (al,a2,a2), b, = (b1, b2, b3). These 
outputs should ideally satisfy the following conditions: 


| [£= +1, ] [of = -1, and af = oF, (1.15) 
j k 

To see that theseconditions are impossible to fulfil perfectly with pre-determined outputs, 
let us arrange the nine pre-determined bits in a 3 x 3 square. Upon being queried, Alice 
outputs the x-th line of her square, and Bob the y-th column of his. The third condition 
says that the value at the intersection should be the same for every call (x, y) of the 
verifier, which means that the squares must be identical. However, the first condition 


9 The square made its first appearance as a test for single-player “contextuality” (see Appendix D.3 for this 
notion) in works of N. David Mermin and Asher Peres. Because of this, it is often called Mermin-Peres Magic 
Square. Here I cite the subsequent works that introduced it as a two-player nonlocality test. 
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implies IL, jak = +1, the second Ihe bk = — 1. If these are to be enforced, Alice’s and 
Bob’s 9-bit squares should differ in at least in one bit—and then, if the verifier calls 
precisely those inputs, he will see that the third condition fails. 

All in all, with pre-determined outputs, Alice and Bob can satisfy the three conditions 
(1.15) for at most 8/9 of the rounds on average; but there exists a quantum state and 
measurements that can fulfil them perfectly, as we shall show in subsection 4.4.2. 


1.5 A Closer Scrutiny: Addressing Loopholes 


As we have just seen, Bell tests can be described in very elementary terms. But is this not 
too elementary, especially given the strong conclusions that are reached? Over the years, 
Bell tests have been submitted to tight scrutiny, searching for flaws, or loopholes, in the 
reasoning or in the implementations. 

The four possible loopholes that have been identified turn out to be very different 
from each other: Some are mere technical fixes (that have been fixed), others border on 
philosophy and can be closed only under reasonable assumptions (in other words, there 
is a price to pay if one wants to believe that they are still open). We review them here in 
this order; their working will be illustrated with the CHSH test. 


1.5.1 The “memory loophole,” or doing proper statistics 


The memory loophole is related to statistics. Basic statistics assumes that rounds of a test 
are independent and identically distributed (1.1.d.), but this i.i.d. assumption is obviously 
unwarranted when it comes to such fundamental tests. Would it be possible for the 
players to give a false positive in a Bell test, i.e., violate a Bell inequality with local 
resources, by adopting a non-i.i.d. strategy, that is, by choosing the process to be used in 
round r based on all that has happened in the previous rounds? The answer is no. 

To appreciate why, consider one round of the CHSH test. The players must choose 
the pre-established quadruple of values to be used in that round. They can base their 
choice on whatever piece of information from the past: There will always be one pair 
of inputs which pulls the sum in the wrong direction (see Table 1.1), and the verifier 
might have picked precisely that pair; so |S| < 2 still holds. This simple argument shows 
that there is only one way for the players to generate a false positive: Avoid the wrong 
pair of inputs, either by refusing to answer or by colluding with the verifier. These are, 
respectively, the fair-sampling loophole and the free-will loophole to be described. 

Thus, even if we initially derived our Bell inequality thinking in i.i.d. terms, we have 
proved that there is no memory loophole if the Bell test is infinitely long and the verifier 
can extract perfect statistics. In a real test, when only finitely many rounds are possible, 
the Bell test must be phrased as hypothesis testing: How likely is the observed string of 
outputs assuming that the players are using a local resource? For such likelihood bounds, 
non-i.i.d. estimators must indeed be used instead of the familiar 1.i.d.-based Gaussian 
standard deviations. We’ll get back to this point in section 2.6. 
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1.5.2 The “detection loophole(s),” or the dangers 
of post-selection 


If in an exam the students were allowed to decline to answer till they are asked a question 
of their liking, the average score would be certainly increased. The same happens for 
nonlocality. In the CHSH test played with local strategies, we have seen that only one pair 
of inputs (x, y) pulls the value of S in the wrong direction: If the players were allowed to 
decline answering when they receive that specific query, the verifier would never catch 
them at fault. There is only a subtlety: Neither of the players knows the pair (x, y), so the 
decision to decline must be made locally, based on x or on y alone. A possible strategy 
in which Alice answers always while Bob is in charge of declining is given in Table 1.2. 
We refer to Exercise 1.3 for a thorough study of this strategy, and to Appendix B.3 for a 
more rigorous quantitative approach. 

The fix for this loophole is clear: The verifier must elicit an answer in every round. This 
does not sound to be a big deal, but it may be. Let us look at this loophole from the 
perspective of an honest experimentalist who possesses a very good nonlocal resource 
(say, a source of photons entangled in polarization) and very accurate measurement 
devices, but whose detectors have poor efficiency. If she is obliged to produce outputs in 
every round, in most of the rounds she’ll have to produce a dummy output because the 
detectors won’t have fired. Nonlocality is quickly washed down, and likely the observed 
data will be compatible with a local resource. 

The situation is all the more annoying because this loophole has a very conspiratorial 
character. As John Bell himself stressed and many after him, quantum theory provides 


Table 1.2 A strategy exploiting the detection loophole for the CHSH test. 
In each round, Alice and Bob choose one of the eight quadruples of 
pre-established values that would give s = +2 (c.f, Table 1.1). Alice 
answers always, while Bob declines to answer to the input indicated by 
brackets, in such a way that the problematic output (boldface) 1s never 


produced. 

ao, a1; bo, b1 aobo aobı abo abı 
+1, +1; +1, [+1] +1 N +1 N 
i EES Paty +1 N +1 N 
+1, +1; +1, [-1] +1 N +1 N 
jaie ti] +1 N +1 N 


+1, —1; [+1], +1 N 
—1, +1; [-1],-1 N +1 
Fiii N 
—1, +1; [+1], —1 N 
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a very accurate description of such an experiment. It’s very hard to believe that this 
accuracy is just an accident due to detectors being inefficient. Besides, the mechanism 
of the loophole assumes that a detector’s firing depends on the input of the Bell test 
chosen in each round (to continue with the example, the polarization basis chosen for the 
measurement). But experimentalists know that the detector’s firing depends on internal 
parameters: With respect to the inputs of the Bell test, the detector is performing a fair 
sampling. 

This is why, rather than claiming failure, all the early experiments have reported the 
observation of nonlocality under the fair-sampling assumption. Most likely, several future 
experiments will legitimately continue to do so. That being said, it’s also important to put 
to record that nonlocality without the fair sampling assumption has been observed, first 
in an experiment with entangled ions (Rowe et al., 2001), then in several other platforms, 
including of course the three “loophole-free” experiments of 2015 cited previously. 

Other loopholes related to the process of detection have been identified for some 
specific implementations: We refer the interested reader to the comprehensive review 
by Larsson (2014). Ultimately, they can all be closed by a rigorous implementation of 
the rule “one output for every input.” 

In summary, although they may prove challenging for some platforms, the detection 
loopholes can be closed and are therefore not a threat for the certification of nonlocality. 


1.5.3 The “free-will loophole,” or measurement independence 


Instead of allowing the players to decline answering, the verifier could reveal some 
information about the inputs he is going to send out in every round. In the CHSH test, 
it is enough to inform the players about a pair of inputs that won’t be used in a given 
round (in fact, more than enough: See Exercise 1.4). 

This possibility sounds as artificial as the previous one, because the verifier has no 
reason to reveal that information. But there is a crucial difference. In the case of the 
detection loophole, it is easy to enforce an answer in every round (if the players refuse to 
comply, the verifier can fill the answer himself and too bad for them). Here, the verifier 
has to ensure that no information leaks out to the players. As frequent reports of leakage 
and hacking confirm, this is notoriously much more difficult to check, and ultimately 
impossible to guarantee in an absolute way. 

In the laboratory setting, this loophole is open if the preparation of the system to 
be measured is correlated with the measurements that are going to be performed. 
This possibility is called measurement dependence; though less frequently than hacking, 
it has also been in the news.!° The ultimate form of measurement dependence is 
super-determinism; leaving metaphysics for section 1.6, let us assume that measurement 
independence is possible in principle, and ask: How can one try and enforce it? 


10 In 2015 it was discovered that a car manufacturer had programmed some models to detect the specific 
procedures of an anti-pollution test. The car’s emissions would then be lowered in order to pass that test, before 
returning to the normal, environment-unfriendly settings. As we see, measurement dependence exists; but it is 
rarely the result of an accident and is usually taken as the evidence of conscious tampering. 
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For some, the ultimate enforcement of measurement independence would be to 
choose the inputs using human free will; whence the name of “free-will loophole.” 
I prefer to leave such a delicate notion as free will out of the picture.!! At any rate, all that 
is required is to choose the inputs with a process that is very unlikely to be correlated with 
the resources shared by the players.!” Some have gone as far as to generate the inputs 
of a Bell test from fluctuations of radiation coming from very distant stellar objects, i.e., 
produced by matter that has not been in contact with earthly matter since inflation, if 
ever (Handsteiner et al., 2017). Impressive for a physicist, this choice of process may 
not be convincing for a technological skeptic: Who guarantees that those telescopes and 
electronics are really producing stellar randomness? This is why others have argued that 
the best way to approach the free will loophole is to generate the inputs from the letters 
of one’s favorite book or the Geneva phonebook (Pironio, 2015). Surely there are several 
ways in which these inputs may not be called random: There is a structure in the text, and 
the information has been available in the Universe for quite some time—still, in order to 
refuse the evidence of Bell nonlocality, our skeptics must now believe that the behavior 
of some physical systems is correlated to the text of a book. If someone does not find this 
insane, there is little chance that they can be convinced anyway. 

In summary: Whether viewed as leakage of information from a verifier, or as hidden 
correlations among the devices in a laboratory, we are in the presence of a loophole 
that can only be closed under reasonable assumptions. In this book, we shall always assume 
that measurement independence holds till section 11.4, where we shall see that one can 
relax it partially and still be able to certify Bell nonlocality. Finally, for interpretational 
matters it is crucial to stress that measurement independence is compatible with determinism: 
It requires that there exist several uncorrelated chains of events, but each chain can 
be deterministic. In other words, by assuming measurement independence, we are not 
introducing indeterminism by fiat, just as we are not required to accept any modality of 
human free will in order to certify nonlocality. 


1.5.4 The “locality loophole,” or hidden communication 
channels 


The last loophole has a different flavor than the previous ones. It does not aim at 
generating a false positive, but a trivial positive: If the nonlocal resource could be 
communication, one says that the “locality loophole” is open. Closing the loophole would 
mean to design a Bell test, in such a way as to certify nonlocality while guaranteeing 
that no communication was happening. This seems to be possible using a physical fact: 
Information propagates at a speed bounded by that of light in vacuum. 


11 I cannot resist referring the reader to an introductory book that reviews the debate on free will, accessible 
to beginners like us physicists (Griffith, 2013). 

12 Tf the process that is used to choose the inputs is pseudo-random, it is a finite-length algorithm and 
ultimately the players would be able to learn it (Bendersky et al., 2016). I cite this in a footnote because, 
important as it is conceptually, this possibility makes no difference in practice: Real Bell tests achieve excellent 
statistical significance in far fewer rounds than would be needed to guess even a moderately short algorithm. 
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Figure 1.2 The space-time configuration in which the locality loophole is closed, with the dotted lines 
representing the light cones. Information about x may arrive at the location of Bob only after the output 
b has been produced, and information about y may arrive at the location of Alice only after the output a 
has been produced. There is no absolute way of determining when the information about the inputs was 
created, nor when a definite output 1s produced: The events that are relevant to close the locality loophole 
can only be defined under reasonable assumptions. 


Recall that the information that needs to be communicated are the inputs. For every 
player p, we denote by x”) their position, by P and i? ) the times at which the input 
is received, respectively at which the output is produced. The verifier wants to enforce 
that each player produce their output before any information about the other players’ 
inputs may have arrived at their location: In jargon, space-lke separation between the event 
“output” of each player and the events “input” of all the other players. This requirement 
reads}3 |x) — x0] > cr — 1) for all pairs of players (p, p/) (Figure 1.2). 

Can one find flaws in this argument? Some may question the physical assumption: 
Maybe some information can propagate faster than light. We shall discuss this option as 
interpretation in subsection 1.6.1, then as attempted models in chapter 11. But there is a 
more subtle possible flaw that went unnoticed for decades: In order to run an argument 
based on space-like separation, one must also be able to zdentify the relevant events. Alice 
can say when and where the input was fed into her devices; but she can’t say when 
and where the information about the input was created in the Universe. Similarly, Alice 
can estimate when an electric current left the detector to convey the information to 
the computer; but is that the time at which the actual result is created, the time of the 
“collapse”? 


13 For notational simplicity, we assume that the relative positions of the players are fixed, which is usually the 
case in implementations. Also notice that, since the condition of space-like separation is Lorentz-invariant, we 
did not need to specify in which frame events are parametrized. 
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In summary, the locality loophole can be closed for known communication channels 
between the players, or by adopting operational definitions of the events, as several 
experiments did.!+ Ruling out any unknown form of communication seems to be 
impossible: Even if the speed of communication is believed to be bounded, we wouldn’t 
know which are the relevant events. 


1.5.5 The unknown loophole: Skepticism 


It should be clear that Bell nonlocality has been scrutinized with great rigor. Some die- 
hard skeptics are not convinced: Every now and then, someone claims to have found the 
flaw in the argument. These claims are usually based on very convoluted arguments and 
tend to fall into three categories: Utterly wrong (for instance, the alleged counterexample 
is just a variation of the detection loophole); exegeses of Bell’s papers (whether Einstein, 
Bell, or anyone else was right or wrong is interesting for the history of science, but science 
should be judged without reference to their authority); deep discussions on the meaning 
of probability (some of which may hit home but would apply to every statistical statement 
and not only to the certification of Bell nonlocality: After all, the philosophy of probability 
and the cogency of statistical conclusions are still debated). 

To be sure, by definition one cannot exhaust the list of possible loopholes, and science 
should always be open to revision. But at this stage, I strongly believe that the burden 
of the proof should be on the deniers. If anyone has found the flaw, they should be 
able to write the corresponding algorithm, take two computers that have been pre- 
programmed together but do not communicate during the rounds of the game, and 
exhibit a statistically significant violation of a Bell inequality [see e.g., section 9 of (Gill, 
2014)]. In the presence of such evidence, all physicists of the “establishment” will be 
ready to reconsider the matter. 


1.6 Experimental Metaphysics? 


There is abundant observational evidence for nonlocality with all the loophole closed — 
the memory and detection loopholes, indisputably; the free-will and locality loopholes, 
up to assumptions that go unquestioned in virtually all the rest of science, and have been 
noticed in this context only because of the strength of the claim. So, we are in the presence 
of a phenomenon that calls for interpretation. 

We may want to start by refocusing on the dilemma. It is the same dilemma for any 
form of nonlocality, but let us just refer to the GHZ test: Given the outputs of two of 


14 The definition of the events is related to electric signals: For the input, when a signal leaves the “random 
number generator” to reach the measurement device; for the output, when the electric signal leaves the detector 
to propagate to the computer where the information will be stored. The locality loophole was first addressed in 
(Aspect, Dalibard, and Roger, 1982a), but the first experiment using a more proper random number generator 
was performed years later in the group of Anton Zeilinger (Weihs et al., 1998). A few months earlier, the 
group of Nicolas Gisin had put the emphasis on the distance rather than on the timing of the random number 
generation, observing Bell nonlocality between players separated by 10km (Tittel et al., 1998). 
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the players, there is only one possible output for the third—and nevertheless, that output 
was not predetermined. When was it determined then, and how? 

Many positions have been put forward, often overlapping with one’s interpretation of 
quantum theory. For the purpose of this book, I have chosen to classify the options in 
four groups. This being my own classification, to avoid both canonizations and imprecise 
attributions I have decided not to insert any citation in what follows. For further reading, 
one can start with some essays of very different, often opposite flavor published with 
the occasion of the 50th anniversary of Bell’s theorem (Fuchs et al., 2014; Maudlin, 
2014; Werner, 2014; Wiseman, 2014; Zukowski and Brukner, 2014). There are also two 
systematic treatises on the meaning of Bell nonlocality, by Tim Maudlin (2011) and 
Jeffrey Bub (2015); and two popular books by Anton Zeilinger (2010) and Nicolas Gisin 
(2014)—and of course, a famous collection of John Bell’s reflections (Bell, 2004). 


1.6.1 Group 1: Nonlocal hidden variables 


The easiest position to describe is that of those who infer from Bell nonlocality the 
existence of nonlocal hidden variables. With this position, one recovers determinism by 
staying within a mechanical paradigm: That is, one can simulate of how nature processes 
information in order to produce the outputs. 

What would these nonlocal hidden variables be? A form of communication (superlu- 
minal or even retrocausal, i.e., propagating to the past), the infinitely rigid quantum ether 
of Bohmian mechanics, a connection in an unknown dimension ... Whatever they may 
be, in our (3 + 1)-dimensional space-time they would appear as “influences” carrying 
information from one location to the other, which Einstein famously dubbed “spooky 
action at a distance” in his debates with Bohr.!° 

Now, these hypothetical influences would carry information, only to tweak it in such a 
way that it looks no-signaling to us. With a positive wink, Shimony called it “peaceful 
coexistence with relativity”. More negative critics rather highlight the conspiratorial 
flavor: Why would nature use a signal while hiding its use from us? Both this fine tuning 
and the relation will relativity will be discussed in detail in chapter 11. In particular, there 
we shall see a quantitative result: In order to remain “hidden,” these influences must 
propagate at an infinite speed in their preferred frame. 


1.6.2 Group 2: Superdeterminism and its friends 


In the second group, I shall put superdeterminism and stances that (at least in my view) 
are akin to it. 

In subsection 1.5.3, we have seen that Bell nonlocality can be demonstrated as soon 
as the processes that choose the inputs and those chosen by the players are independent. 
The strongest way to deny this measurement independence is superdeterminism: All 
the events in the Universe constitute a single, deterministic process. There could be 
somewhat milder ways of denying measurement independence: For instance, worldviews a 


15 The first record of this expression seems to date from the 1927 Solvay Conference. 
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la The Matrix in which our Universe is a big simulation, maybe not deterministic in origin 
(at could be run by aliens using their true free will). Needless to say, the consequences of 
adopting such a worldview extend far beyond solving the conundrum of Bell nonlocality. 

Another option, directly inspired by quantum theory, are the so-called many-worlds 
interpretations that deny that a definite output is ever singled out. These interpretations 
say that the reversible dynamics of quantum theory is an accurate depiction of the 
deepest reality, which we do not perceive with our senses but have discovered with 
our investigations. What is usually deemed irreversible in an elementary reading of 
the quantum formalism, namely the act of measurement, is nothing else than getting 
entangled with the apparatus, and then with the environment, with the hard disk that 
stores the data, with the consciousness of the players ... The players will perceive definite 
outputs, and the verifier will certify Bell nonlocality after many rounds, because that’s 
how the rules are set: In every world, there is indeterminacy and nonlocality. But nature 
is playing all the options, and this deployment of correlations is fully deterministic. 


1.6.3 Group 3: Only statistics are speakable, a.k.a. 
the “orthodox” 


Like the many-worlds interpretations, the third group also asserts the correctness of 
quantum formalism, but in a very different way. Here, quantum theory is the correct 
way of computing probabilities in the (only) physical world—and there is nothing else 
we should talk about. Individual rounds of a Bell test (or of any other experiment: 
interferometers, Stern-Gerlach ...) are “unspeakable.” 

As some of my colleagues like to say, this is “just standard quantum mechanics”: 
Indeed, it is what people identify as the orthodox interpretation. But what should be 
mentioned, is that it implies a significant epistemological discipline. Physics is generally 
understood as a representation of nature, a study of the constituents of matter and their 
dynamics — loosely speaking, it should describe “how nature does it.” However, a long 
list of philosophers may find this stance too naive, and the intrinsically statistical character 
of quantum theory has won several physicists over to the idea that a law of nature may 
rather be a way of organizing our knowledge. Bayesianism becomes the proper language: 
Probabilities capture someone’s degree of belief. A law of physics does not prescribe the 
belief itself, which is subjective, but how beliefs should evolve given new information 
(and it is perfectly acceptable that these updating rules be not subjective). 

If someone adopts this approach to physics and knowledge, the “intrinsic indetermi- 
nacy” of quantum theory adds only a minor element of discomfort: At some point, agents 
have to give up the possibility of a more refined description, one leading to stronger 
beliefs. As for the description of Bell tests, contrary to the many-worlds interpretation, 
the definite outputs are real and the agent has a special role. 

The strength and weakness of this position are both simultaneously evident in the way 
it deals with the GHZ test: One refuses to explain how the third player manages to give 
that unique answer, but stresses than an agent should definitely bet on that unique answer, 
if informed about the other two. This position makes pragmatically correct statements 
and avoids all the problems. For some it’s wisdom, for others escapism. 
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1.6.4 Group 4: Hoping for collapse 


This last position, in a sense, closes the circle. In this view, the outputs are real (contrary 
to some in Group 2), they are produced through some intrinsically random process 
(contrary to Groups 1 and 2) and through Bell nonlocality we are learning about this 
process that does happen in nature (contrary to Group 3). 

The challenge here is to describe the process, usually called “collapse,” by which the 
output of each round is generated. There have been several attempts, but none seems 
to be fully convincing. Because of Bell nonlocality, any collapse model will have to be 
nonlocal to describe bipartite or multipartite statistics: The process that generates one 
player’s output must take into account other players’ inputs. But it seems inescapable 
that collapse models exhibit some form of nonlocality for single-player processes too: 
In a measurement of position, the particle must localize itself somewhere, whence the 
possibility of finding it elsewhere should fall to zero; if a single photon is sent through a 
beam-splitter and found in one of the beams, it must become impossible to find it in the 
other beam. 


1.6.5 Additional remarks 


I have just tried a simple systematization of the current state of interpretations. This 
matter is complex and can be approached from several angles. Pll go through a few more 
viewpoints here. 

Let us first address the question of whether Bell nonlocality is experimental metaphysics 
that shapes our Weltanschauung. Group 1 would certainly claim so. For Groups 2 and 
3, Bell nonlocality is just one of the rules that have been set up and does not play 
a foundational role (when it comes to Group 2, very few facts can claim to play a 
foundational role in a deterministic worldview). For Group 4, collapse models have 
first been studied to explain the appearance of a classical macroscopic world, but Bell 
nonlocality is something that such models are urged to explain too. At any rate, whatever 
position one reaches after reflection, Bell nonlocality must have entered that reflection: 
Nobody brushes it off as irrelevant a priori. 

Next, I find it interesting to compare these interpretations in terms of resources and 
information. We don’t have a recipe to describe how the outputs are generated if we stick 
only to resources that we can control. Group 4 hopes that this is still possible, at least 
with a special recipe of collapse. Group 1 favors a mechanistic recipe, at the price of 
introducing unobserved resources, the nonlocal hidden variables. Groups 2 and 3 stick 
to the resources that we have: For Group 2, all the outputs are generated according 
to the rules and it’s sheer chance that we end up perceiving one rather than another 
alternative; for Group 3, how nature generates the outputs is not our business as long as 
our predictions are correct. 

Further, let us consider the issue of whether quantum theory is complete, which was the 
title of the EPR paper. Both Groups 2 and 3 would definitely answer that quantum theory 
is complete; but we have noticed that they differ in what they call “quantum theory”: 
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For the ones it’s the kinematics and reversible dynamics, for the others the recipe for 
computing probabilities. Representatives of Group 4 would like to complete the theory 
with a model of collapse, probably hinting that collapse has always been a desired feature 
of quantum theory, a statement with which the others would vehemently disagree. Group 
1 advocates for the need of quantum theory to be completed, at least in its ontology (our 
predictive power may well remain the same). 

Philosophical labels are also worth mentioning. Based on observation, I can certify 
that most of my colleagues would like to be called “realist” in the philosophical sense of 
believing in the existence and intelligibility of an external reality. Conversely, when the 
debates get heated, it is frequent to hear insults like “idealist,” “pragmatist,” or “solipsist” 
thrown to the representatives of the other camp. One may wonder, for instance, where is 
the realism in Group 3: They would answer that the laws for updating our beliefs come 
to us from observation. All in all, this kind of labels should be avoided, also because few 
of us physicists would be able to pass an exam of philosophy on their exact meaning— 
philosophers themselves may not agree! 

Another favorite topic of speculation is where great figures of the past would stand in 
this debate. The usual names that come up are of course Einstein, Bohr, and Bell himself. 
Assuming that he would stick to his aversion to a dice-playing God and to action-at-a- 
distance, Einstein would either think that we don’t have yet enough evidence (something 
like Group 4) or lean towards determinism at higher levels (Group 2). For Bohr, it is clear 
that he would despise Group 1; if P’'d have to bet, his sympathy would go to Group 3. We 
know more for Bell. He set out to construct a local hidden variable model, probably 
thinking that it was possible. When he found it is not, he shifted towards Bohmian 
mechanics, and when collapse models started to be studied he clearly looked at them 
with great hope. Where he would stand today, given the evidence that collapse models 
have not delivered much, is only guesswork. 

Surely there is much more to it, and the readers will find further inspiration in 
reading more or in their own reflections. For the purpose of this book, it is time to 
put an end to this general introduction and to move on to the formalization of Bell 
nonlocality. 


EXERCISES 


Exercise 1.1 Prove that the Mermin inequality (1.10) holds indeed for deterministic local 
variables. Hint: Either co = c1, or co = — c]. 


Exercise 1.2 Re-derive Hardy’s local variables (LV) prediction (1.14) by ticking out from 
Table 1.1 the quadruples of pre-established values that do not comply with the constraints 


(1.11)-(1.13). 


Exercise 1.3 We consider a modification of the detection loophole strategy for CHSH 
described in Table 1.2. At every round, Alice and Bob choose one of the eight quadruples listed 
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in the Table with probability a Then Bob applies that strategy of declining with probability 
1 — p, whereas with probability p he produces the agreed output to whatever input he receives. 
The verifier computes S using only the rounds in which both Alice and Bob replied. 

Prove that the verifier will observe S = rs (hint: What is the fraction of rounds in which 
Bob replies?). Deduce that this strategy gives a false positive for every p > 0. 


Exercise 1.4 We consider a false positive for the CHSH test based on the free will loophole. 


1. In every round, the verifier informs the players that one specific pair of inputs will be sent 
out with probability q, while the three other pairs will be equally probable. Find the value 


of S that the players can achieve with local strategies for every q € |o, 3]. Deduce that 


this leads to a false positive for every q < i and that the quantum maximum S = 2/2 
can be reached without having to set p = 0. 

2. Consider now a different situation: The verifier informs the players that in every round 
the pair (x,y) = (1,1) is drawn with probability q and the other three pairs with equal 
probability. Does this open any loophole? Hint: The probabilities that enter a nonlocality 
test are conditional on the inputs. 
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Luniverso [...] é scritto in linguaggio matematico e le lettere sono triangoli, cerchi e 
altre figure geometriche. 


The Universe is written in mathematical language and its letters are triangles, circles 
and other geometrical figures. 


Galileo, I] saggiatore 


This chapter introduces the mathematical framework for the study of Bell nonlocality. 


2.1 Bell Scenarios, Processes, and Behaviors 


Virtually all the theory of Bell nonlocality is a study of probability distributions. This means 
that one is describing Bell tests in the asymptotic limit of infinitely many rounds. The 
application of this theory to real implementations, with their necessarily finite number of 
rounds, does not differ essentially from any laboratory use of statistical tools (see some 
specific remarks in section 2.6). 


2.1.1 Bell scenarios 


A Bell scenario is defined by specifying the number of players, the alphabet of inputs! that 
each player can receive, and the alphabet of possible outputs (the latter may be different 
for each input, but we shall only consider Bell tests in which the alphabet of outputs is 
the same for each input). 

From now until chapter 5, we consider two-player, or bipartite, scenarios. The players 
are usually called Alice and Bob. The (M4, m4; Mp, mp) Bell scenario” is one in which 
Alice (Bob) has M4 (Mp) possible inputs and m4 (mg) possible outputs for every 
input. In general statements, I shall adopt the notation x€ ¥ = {1,..., M4}, ae A= 
{l,...,m4}, ve Y= {1,...,Mp}, bE B= {1,...,mp}. In concrete examples, it may be 


> 


1 In the literature the inputs are often called “settings,” since in the laboratory they determine which 
measurement to perform, i.e., how to set the measurement apparatus. 

2 For unknown reasons, the conventional usage in the literature is rather (M, ‘A> MB, m4, mp). I find it more 
logical to divide according to player, rather than specifying first the alphabet of the inputs and then that of the 
outputs. 
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more convenient to adopt other conventions, for instance {0, 1} or {—1, +1} for binary 
alphabets, and I shall freely do it when no ambiguity is possible. 


2.1.2 Processes and strategies 


For every round, before receiving the inputs, the players agree on the process by which 
they will produce the outputs. At this point, the physical implementation of the process 
can be virtually anything: Computing a function, casting dice, and tossing coins, using 
quantum computers, discussing over the phone, or even operating as yet-unknown 
physical resources. The players obviously have to know what they are doing—but we 
don’t need to: We treat the process as a black box and describe it only in terms of how the 
outputs are related to the inputs. This relation may be stochastic. Therefore, we identify 
a process A with the collection of the probabilities P, (a, b|x, y) of obtaining the outputs 
(a, b) when the inputs (x, y) were given: 


P, = {Pi (a, b|x, y)| a € A,b E€ Bx E€ X, y E€ VY}. (2.1) 


The probabilities must be non-negative, i.e., 


P, (a,b|x, y) > 0 Yace A,be B, xe X, yey, (2.2) 


and properly normalized for each pair (x, y), i.e., 


XO Pia, bx, y)=1 Vee ¥, yey. (2.3) 
acA,beB 


For each pair of inputs (x, y), one has to give all the P} (a, b|x, y) but one, because of the 
normalization. The number of real parameters needed to describe a process is therefore 
Dgen = MaMp(mamp — 1). 

A deterministic process is one in which the outputs are uniquely determined by the 
inputs, i.e., Py(a, lx, y) = b@,H)=F(x,y,4) = Sa=flx, y,2)Sb=e(x,y,4)- The number of 
deterministic processes is p = (mamp)“4™2: For each pair of inputs, one has to say 
which pair of outputs is being produced. 

We also have to say something about the players’ strategy over all the rounds of the 
test: 


e Measurement independence is assumed throughout the book till chapter 11: Thus, the 
choice of the process should not depend on the inputs of the round itself. In formal 
terms, à and (x, y) are independent. 


e We usually work with Independent and identically distributed (1.1.d. strategies. In 
an i.i.d. strategy, the process for each round is selected independently of what 
happened in previous rounds, by sampling from a given probability distribution 
QQA), with QCA) > 0, and fdaQ(A) = 1. As mentioned in subsection 1.5.1, the 
i.i.d. assumption should be taken critically: This will be addressed in section 2.6. 
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2.1.3 The observed behavior 


The verifier does not know which process is used in each round. He observed coarse- 
grained statistics that are called the observed behavior, or simply the behavior, of the 
players: 


P = {P(a,b|x, y)|a E€ A,b E€ BxeX, yey}, (2.4) 


where 
Peaks J AA (2.5) 


By construction, the P(a, b|x, y) are also non-negative and normalized: 


P(a bix, y) > 0Yace Abe BxeX,yey (2.6) 


XO Pla, bx, y)=1Yxe¥, yEy. (2.7) 
acA,beB 


Also, a generic behavior is specified by giving Dgen real parameters. The behavior is going 


to be deterministic if and only if the players use a given deterministic process Ag in every 
round, i.e., O(A) = 6(A — Ag). 


2.2 No-Signaling Processes and Behaviors 


2.2.1 Definition and motivation 


We introduce the following: 


Definition 2.1 A process à is called no-signaling if each player’s output statistics do not 
depend on other player’s inputs, that is if 


P)(alx, y) NS Py (alx, y’) = P, (alx) forala ce Axe X, y, y EVY 


NS (2.8) 
P, (bjx, y) = Pi (bix, y) = Pi (bly) for all b e B, y E€ VY, x, x E X. 
Similarly, a behavior is called no-signaling if 
P(a|x, y) NS Pcalx, y’) =P(a|x) forall ac A,x E€ X, y, y EY (2.9) 


Peblx y) E Plx, y) = Ply) for allb € B, y € VY, x, x! € X. 


These conditions are called the no-signaling constraints. The generalization to multipartite 
scenarios is straightforward. 
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Obviously, if all the P, satisfy the no-signaling condition, so will the behavior P, but the 
converse is not true (Exercise 2.1). 

When encountered for the first time, this important definition may sound convoluted: 
Wouldn’t it be simpler to say that each player’s output does not depend on the other 
players’ input? Here lies exactly the subtle distinction between no-signaling and local 
processes and behaviors. The latter will be formally introduced in the next section, but 
the reader should recall the qualitative discussion of these issues in subsection 1.2.3. 


2.2.2 Description of no-signaling statistics 


Because of the no-signaling constraints, it takes fewer parameters than Dgen to describe a 
no-signaling process or behavior. The counting can be done as follows. One can first take 
the marginals as independent parameters: There are M4(my, — 1) independent P(a|x), 
and Mg(mpg — 1) independent P(b|y). Consider now any choice of (x, y): For every fixed 
b = B, once the marginal P(a) is given, one is left with m4 — 1 independent numbers 
Pa, p); similarly, for every fixed a = a, once the marginal P(b) is given, one is left with 
mp — 1 independent numbers P(a, b). All in all, a no-signaling process can be fully 
specified by giving 


Dys = Ma(ma — 1)Mp(mp — 1) + Ma(ma — 1) + Mg(mg— 1) (2.10) 


independent real numbers. 

In line with this parameter counting, we can introduce the convenient Collins-Gisin 
representation of a no-signaling process or behavior (Collins and Gisin, 2004). We show 
it for the Bell scenarios M4 = Mp = 2 and my = mp = 2, 3: 


P(a=1|x=1) | Pla=1|x=2) 
PSPG=Ip=) | PA, D PA, 112, D 
P@=lly=2) | Pd,11,2) PA, 112,2) 


(2.11) 


v 


P(a=1|x=1) P(a=2|x =1)|P(a=1|x = 2) P(a = 2|x = 2) 
Pesin PEILI PEUL | LGI PE 
P=P(b=2\y=1)|| PA,2ALDÐ P(2,2/1,1) | P(,2/2,1)  P(2,212,1) 
P@=1p=2| Pd,1M,2)  P,1/1,2) | Pd,1/2,2)  P(2,1/2,2) 
P(b=2\y=2)|| PU,2]1,2)  P(2,2|1,2) | P(,2|2,2) P(2,2/2,2) 


The number of entries of the table are indeed Dys = 8, 24. All the probabilities that 
are not written explicitly can be reconstructed from the given ones: For instance, in the 
second case, one has P(a = 3|x) = 1 — P(a = 1|x) — P (a = 2|x); PG, dlx, y) = Poly) — 
Pd, d|x, y) — PQ, b|x, y); etc. The example is easy to generalize to any other Bell scenario: 
There must be entries for each value of the inputs, while one element of each output 
alphabet can be skipped since its statistics can be reconstructed from the rest; and the 
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marginals can be taken out precisely because of the no-signaling constraints, that make 
them independent of the other player’s input. 


2.3 Local Behaviors 


2.3.1 Definition and motivation 


In chapter 1 we introduced the idea of locality in a Bell test: In each round, the process i 
must be such that each player’s output is generated taking only the same player’s input 
into account. At the level of probability distributions, this translates into the following 
definition, which is obviously central to this book: 


Definition 2.2 A process is called local tf it is of the form 


Py(a,b|x,9) © P (alx) Py (bly), (2.12) 


where we have used the traditional label “Local Variables.” 

A behavior is called local if it can be written as a convex combination of local processes, 
as in (1.1). A behavior that cannot be written that way is called Bell-nonlocal, or simply 
nonlocal. 


Local processes and behaviors clearly satisfy the no-signaling constraints, thus they 
can also be described by Dys real numbers and cast in the Collins-Gisin representation. 
But a local process is much more constrained, ad nauseam, let’s us compare the two 
definitions in words: 


e Local process: In each round, each player’s output is independent of the other 
players’ inputs. 

e No-signaling process: The marginal distribution of each player’s output is indepen- 
dent of the other players’ inputs. 


2.3.2 Local deterministic processes 


Local processes don’t need to be deterministic, but the class of local deterministic 
processes plays an important role. 


Definition 2.3 A process Pip, 1s a local deterministic (LD) process if, for any input, the 
output is deterministic: 


Prp, (alx) = ba=f(x,2)2 Pid, (Oly) = ôb=g(y,1)- (2.13) 


An equivalent way of characterizing a LD process consists in just giving the list of outputs 
for all possible inputs: 
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ALD = {@1525...54M43615625...,0Mp} € Alt! x BM, (2.14) 


the link with the previous notation being ax = f(x, à) and by = g(y, A). From this notation, 
it is obvious that the number of LD processes is 


tip = my 4 mye, (2.15) 


These LD processes are all and the only deterministic processes that satisfy the no- 
signaling constraints (2.8); all the remaining fp — zp deterministic processes can only 
be realized with signaling resources (Exercise 2.2). 


2.3.3 Characterization of local behaviors: Fine’s theorem 


By definition of Bell locality, the behavior seen by the verifier is a local behavior, if and 
only if there exist a family of local processes à and a distribution Q(A) such that 


P(a, b|x;y) Z fa QQ) P, (alx) P) (Oly). (2.16) 


This expression, which we have already encountered (1.1), is the main mathematical 
object in the formalization of Bell nonlocality: We’ll need to find ways to prove that a 
given behavior is not of this form. As it turns out, this general form is not very useful 
because the number of processes à and the distribution Q(A) are arbitrary. Fortunately, 
there exist two compact characterizations of local behaviors: 


Theorem 2.1 These two equivalent results are both due to Arthur Fine (1982): 


(a) A behavior P is local if and only tf it is a convex mixture of local deterministic 
processes: 


M4, Mp 
ma mpg 


LV 
Pla, dix) = X` 9. aie Safa) btr) (2.17) 
j=l k=1 


with à = (j, k) and ik Lik =f; 

(b) Using the notation (2.14): A behavior P is local if and only tf there exists a joint 
probability distribution P(a1,...am43b1,...»bMp)» such that each P(a, b|x, y) can 
be obtained as the marginal 


LV 
Plab) = XO J Plas,...ama5bis---sbMp). (2.18) 
(a;l jx} {brlk £y} 


Proof Let us start from statement (a). The “if” direction is obvious, because (2.17) is 
a special case of (2.16). For the reverse, given P(a, b|x, y) of the form (2.16), one 
can compute the cumulative distribution X (a) =), <a Pa (@|x), where we use the 
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labeling a € A= {1,...,m,4} of the outputs (labelings being arbitrary, one can redo 
the proof for any other choice, at the expense of a more cumbersome notation). 
Let us then add a new local parameter u4 € [0, 1] and declare that a is chosen 
according to the following deterministic rule: 


1 if D(a—1) < ua < LQ), 


0 otherwise. (2.19) 


Pisna (a|x) = | 


If u4 is drawn with uniform distribution, the original process is recovered: 


Ela) 


1 
f dna Pamala) =f dna = Pia. 
0 x(a-1) 


The same construction can be made on Bob’s side. Therefore (2.16) can be 
rewritten as 


{ 1 
IV 
Perpa J QO) f aie [ diePc GOP gO: 


Now, as we said, each P,,,,,(a|x) is deterministic, so it must be equal to one 
of the da=fi(x) ; and similarly, each of the P,,,,,(6|y) must be equal to one of the 
5o=g,(y). The weights jz are just the integrals of the measure O(A)dAdu,dup over 
the corresponding sets. This concludes the proof of (a). From here, the proof of 
statement (b) is straightforward. Indeed, on the one hand, LD processes fix the 
value of the output for all possible inputs, so (a) => (b) for 


P (a; =f(1),..-.4my =F (MA); b1 = ge (1)... bmp = gk(MB)) = djk- 


On the other hand, any assignment of values to all the ax, respectively by, defines 
a valid f;(x), respectively g(y); so (b) => (a). 


2.3.4 Examples of local behaviors 


It is useful to look at a few examples of local behaviors. 


e The behavior of one player can always be reproduced with LV and thus with a 
mixture of deterministic strategies. In the case of quantum behaviors, A may contain 
the description of the quantum state p: The player can then compute on paper the 
expected probability distribution for any measurement, then generate the outputs 
according to that distribution. The fresh reader will find this statement obvious, 
and such indeed it is. More sophisticated readers may have problems with notion 
accumulated in their formation: Did not great teachers like Feynman and Schwinger 
derive the quantum formalism from the single-spin Stern-Gerlach measurement? 
Isn’t it established that there cannot be a LV model for states with negative Wigner 
function? These concerns are addressed in Appendix D. 
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One of the most obvious bipartite behaviors that can be described with LV is 
exactly the one that is presented in popular lore as an astonishing feat of quantum 
entanglement: Two players produce always the same output when queried with the 
same input. Even without mathematics, it is clear that this is so, because we can 
achieve the same in everyday life.? For instance, we could ask two persons to 
dress identically, then send them to different location: If queried about the same 
piece of their apparel, they will provide the same answer. What is astonishing with 
entanglement, is that this happens while the outputs are not pre-established; and 
we know that they are not, because some other measurements can be used to prove 
Bell nonlocality. 


Every bipartite behavior in which one of the players has only one input is also local. 
Indeed, if |X| = 1, Alice’s actual input in each round is known in advance, so it’s 
known by Bob too. In every round, the process à will determine Alice’s output and 
distribute Bob’s outputs accordingly for each of his possible inputs. 


More subtle than the previous, let’s suppose that Bob’s outputs b = (6),..., bm) have 
a joint probability distribution P(b) independent of Alice’s inputs.* For every input 
x of Alice, it follows from the previous point that one can construct P(a,,b) such 
that summing over all Bob’s inputs but the desired b, yields P (ax, by). Moreover, the 
independence of Bob’s distribution from Alice’s inputs requires $- a, P (axs b)=P(b) 
for all x. Then, we can construct the joint probability distribution 


TH Plas,b) 


POA (2.20) 


P(ai,.. am, b) aa 


such that all marginals match the behavior. Indeed, by first summing over all Alice’s 
inputs but the desired ay, one recovers P(ax,b); at which point, by construction the 
sum over all but one of Bob’s inputs yields P (ax, by). 


If each player can produce perfect copies of the information shared to define the 
process, the resulting behavior will be local. Indeed, each player can produce as 
many copies as his/her inputs, then apply the procedure to obtain the corresponding 
output, before the run of the test. This is a situation of pre-determined outputs. 
Thus, all behaviors that can be obtained with perfectly copiable information 
can also be obtained with pre-determined outputs (for a small modification, see 
Exercise 2.3). 


3 This is a variation over the criterion set forward in one of the earliest reviews on Bell nonlocality (Werner 
and Wolf, 20010): If it can be done with ping-pong balls, it’s not interesting. 

4 Notice that this is stronger than requesting that the bipartite behavior is no-signaling: It’s requesting no- 
signaling even if Bob could actually obtain the outputs corresponding to all his inputs in each round. In quantum 
theory, which is formally no-signaling, this would always be the case; an obvious case in which P(b) exists is 
that of Bob using a set of commuting measurements. 
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2.4 The Local Polytope and Bell Inequalities 


In this section, we study the set £ of all local behaviors, in short the local set. We introduce 
a geometric characterization which is fundamental for the study of Bell nonlocality. The 
famous Bell inequalities are the conditions for a behavior P to belong to £. 


2.4.1 The local set as a polytope 


The first simple property of £ is convexity (Figure 2.1, left): For all p € [0, 1] and for all 
choice of Pry,1 € £ and Pry,2 € L, the convex sum P = pPry,1 + (1 — p)Piy,2 belongs 
to £ too. The proof is straightforward: If we use the representation (2.17) for Pry,1 
and Pry,2, the convex sum P is also of the same form with gjz = pajr,1 + (1 — P)dik, 2- 
Convexity is also an intuitive necessity: One possible way of realizing P is to imagine that 
in a fraction p of the rounds one is sampling from Pyy,1, and in the other rounds one is 
sampling from Pry,2. Clearly, which behavior is used in each round can be decided in 
advance. 

Because probabilities are bounded, the set £ is compact. Now, a compact convex set 
is the convex hull of its extremal points,” i.e., those points that cannot be written as a 
convex sum of other points in the set. The extremal points of L are all the local deterministic 
behaviors and only those. 


Prie , 
Pry, 2 


Trivial facet 


Figure 2.1 Top: Geometry of the convexity of the local set: If two local behaviors are represented by 
points in a probability space, the whole line connecting them is also made of local behaviors. Bottom: 
Sketch of the local polytope L as a subset of the set of no-signaling behaviors (light grey), both embedded 
in RPNS, with Dys = 2 Jor the sake of the illustration (the minimal Bell scenario has Dys = 8). 

The local polytope as drawn has four extremal points, one trivial facet and three tight Bell inequalities 
(see subsection 2.4.3). The dotted lines in the third dimension remind that the no-signaling set is a set of 
zero measure in the space of all possible behaviors. 


5 This is very intuitive; the formal proof goes under the name of Krein-Milman theorem. 
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Let us prove this statement: We know from Fine’s theorem 2.1 that any P € £ can be 
written as a convex sum of LD behaviors: Thus, only LD behaviors may be extremal. Ex 
absurdo, suppose that a LD behavior can be decomposed as convex combination PŁD, = 
PLD, +  —p)Prp,2 with 0 < p < 1 and Pip, # Pip,2. The latter two behaviors must 
differ for at least one pair of inputs (x, y): For these inputs, there are (a1, 81) Æ (@2; B2) 
such that Pzp, 1(@1, 1lx, Y) = 1, and Prp,2(@2, B2|x, y) = 1. But then Prp, (a1, B1|x; 
y) = p and Prp>(@2; B2|x, yY) = 1 — p: So Prp, is not deterministic, contradicting the 
assumption that it was LD. 

Thus, the local set is a bounded polyhedron or polytope embedded in RPS (Figure 2.1, 
right). Since typ > Dns, some extremal points are linearly dependent. The (Dys — 1)- 
dimensional hyperplanes that delimit it are called facets and are in finite number. To be 
a facet, a hyperplane must satisfy two properties. First, to ensure the dimensionality, at 
least Dys linearly independent extremal points must lie on the hyperplane (unless the 
zero vector is chosen as to belongs to the facet, in which case the number is Dys — 1). 
Second, all the extremal points that do not lie on the hyperplane must be found on the 
same side of it. Mathematically, let v € RPNS is the vector normal to the facet and oriented 
outside the polytope: If the equation for the points P of the facet is v-P =f, then 


v-P <f forallP EL. (2.21) 


It is clear that a polytope is fully determined by listing its extremal points, because 
this information is sufficient to find all the facets. Alternatively, a polytope is also fully 
determined by listing its facets: Their one-dimensional intersections are the extremal 
points. In the case of £, the extremal points are easy to list because they are just the LD 
behaviors, while finding the facets requires some computation.°® 


2.4.2 Assessing the locality of a behavior 


Assessing whether a given behavior P is local is the elementary task of Bell nonlocality. 
Essential in theoretical studies, this assessment is also part of the more complex task of 
assessing whether a real, finite set of data is compatible with local realism, which will be 
discussed in section 2.6. It is an instance of a “membership problem,” since it is asking 
whether the behavior belongs to the local polytope £. 

We just noticed that it’s easy to write down the extremal points Pyp,;, so we are going 
to formulate the problem as: Find q = {q1,..-.5¢#;p} such that 


ftp 


A tip 
qi 20 Vi, X z117 l, P= X GiPLD,i- (2.22) 
i=l 


6 We shall see in section 9.1 that the opposite is the case for another polytope relevant for nonlocality, the 
so-called no-signaling polytope: There, the facets are easy to list, and one has to work to find the extremal 
points. 
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This is a feasibility linear program, one the simplest instances of convex optimization. 
The standard reference for convex optimization is the book of Boyd and Vandenberghe 
(2004), and we provide a very elementary introduction as Appendix E. Here it is enough 
to know that, while it is usually impossible to give a solution in closed form even for a 
linear program, algorithms are known, and packages implementing them exist for most 
mathematical softwares. These algorithms generate the so-called dual program of the 
original one (which is then called the primal) and search for solutions for both. 

For the problem under study, the duality of linear programs reflects the dual charac- 
terization of the polytope in terms of extremal points or in terms of facets. Indeed, as 
we show in Appendix E.2.1, the dual of the program (2.22) is a program that checks 
whether there is a facet that separates the behavior P from the whole local polytope. The 
algorithm has therefore two possible outputs: If it finds a solution q for the primal, it 
returns this solution, proving that P € £; otherwise, it returns the equation of a facet that 
separates P from £, proving that P ¢ L. In either case, the verification of the correctness 
of the output is straightforward. 

Although it is obvious from what precedes, I want to highlight explicitly that the 
locality of behaviors can be assessed without any prior knowledge of the facets—that is, in 
the terminology that we are going to introduce, of the Bell inequalities. This is going to 
be important, since listing all the facets is a daunting task for most Bell scenarios, as we 
shall see in chapters 4 and 5. 


2.4.3 The notion of Bell inequality 


A criterion that separates some nonlocal behavior from all LV behaviors is generically 
called Bell inequality. Usually, one considers linear Bell inequalities, i.e., criteria of the 
from 


IP) = È. VabxyP(@,b|x,9) < Ir. (2.23) 
a,b, xy 


As mentioned, the most natural candidates are the facets of the local polytope. The number 
of facets being finite, it is in principle possible to list them all for any Bell scenario. 

Not all the facets of £ are candidates for Bell inequalities: All the M4Mgm,mp 
conditions P(a, b|x, y) > 0 define facets,’ called positivity facets or trivial facets. 

The equations (2.23) of the non-trivial facets of the local polytope are called tight 
Bell inequalities. Non-tight Bell inequalities are suboptimal for the mere certification of 


7 This is not completely trivial: Since £ satisfies also no-signaling, one should check if some of the positivity 
conditions happen to coincide when the no-signaling constraints (2.9) are enforced. As it turns out, this may 
happen, but only in uninteresting scenarios: Notably, if mp = 1 i.e., the only output for Bob is 0, then the 
no-signaling condition on the marginal of Alice reads P(a|x) = P(a, 0|x, y) for all y; in other words, all the 
conditions P(a, 0|x, y) > 0 are the same. Also, one may wonder why there are no facets associated to the other 
boundary, P(a, b|x, y) < 1. The reason is that the normalization }> a,b P(a; 6|x,y) = 1 is always assumed, and 
consequently the upper bound on each P(a, b|x, y) is a consequence of the positivity of all the other probabilities. 
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nonlocality but may have additional interesting features, we shall encounter examples in 
subsection 4.2.3 and in section 5.3. 


2.4.4 Multiple versions, liftings, and no-signaling 
forms of an inequality 


Every Bell inequality appears in multiple versions in probability space, due to the freedom 
of relabeling: If a criterion tests nonlocality, the same criterion in which some of the labels 
have been changed will also test nonlocality in the same way. The possible relabelings are 
the following: 


e One can relabel the inputs of each player. Specifically, if 7 is a permutation over 
My elements and z’ is a permutation over Mpg elements, then 


Vabxy —> Vabr(x)r'(y) (2.24) 


defines another version of the same inequality. 


e Once the labeling of the inputs is chosen, for each input one is free to relabel the 
outputs. Specifically, for each x we write 7, a permutation over m4 elements, and 
for each y we write Ty a permutation over mp elements: Then 


Vabxy —> Ura), (b)xy (2.25) 


defines another version of the same inequality. 


e The most commonly studied Bell scenarios are such that (M4, m4) = (Mp, mp). 
In this case, the multiplicity can be further increased by a factor 2 by permuting the 
name of the players, i.e., 


Vabxy —> Vbayx lif(M4,m4) = (Mg, mp)]. (2.26) 


This counting shows that there can be as many as My!Mep!(ma!)“4(mp!)™2 versions 
of a given inequality, or even twice as many if the players can be exchanged. However, 
several relabelings may lead to the same version, so the multiplicity of an inequality is 
usually smaller. The multiplicity must be taken into account: For P to be local, one must 
check that Z(P) < Zr holds for all versions of all the Bell inequalities in the scenario. This 
may be bothersome on paper, but recall that a simple linear program does it without the 
need to give any Bell inequality a priori. 

Besides, when moving from a Bell scenario to a larger one (more players 
and/or more inputs and/or more outputs), all the inequalities of the smaller sce- 
narios are inherited in several versions called Hftings. As an example, suppose that 
{vabxylas b,x, y € {0, 1}} defines a Bell inequality in the (2,2; 2,2) scenario: If we 
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move to the (3,2; 2,2) Bell scenario, we can already anticipate three Bell inequal- 
ities, defined by {vabxylx € {0,1}, a,b, € {0, 1}}, {vabxylx € {0,2},a,b,y € {0, 1}}, and 
| vabxylx € {1,2}, a,b,y € {0, 1}}—each of course with its multiplicity due to relabelings. 
If the original inequality is a facet of the local polytope, the liftings will also be facets of 
the local polytope for the larger scenario. 

Finally, we must mention that, given a version of an inequality, one can transform 
the Vabxy almost beyond recognition by exploiting the normalization of probabilities and 
even more the no-signaling constraints (2.9). Just to give an idea, consider a € {0, 1}: 
With normalization, one can write 1 = P(a = O|x, y) + P(a = 1|x, y), and using no- 
signaling one can even write 1 = P(a = O|x, y) + P(a = 1|x, y^) with y Æ y. We shall 
refer to different forms of an inequality, and we shall see several examples, starting from 
the next section. 

Ultimately, all these are symmetries that can be elegantly and efficiently tackled with 
tools of group theory (Rosset, Bancal, and Gisin, 2014). In this book, we shall deal 
with these issues in a rather pedestrian way, limiting ourselves to specific examples and 
occasional warnings. 


2.5 The CHSH Scenario 


The simplest Bell scenario is (2, 2; 2, 2), usually known as CHSH scenario from the work 
of (Clauser et al., 1969) that we have already encountered in section 1.3. 


2.5.1 The local polytope and the CHSH inequality 


In this scenario, the local polytope is embedded in R® and has zp = 16 extremal points, 
each being the product of one deterministic process for Alice [(ao, a1) = (0, 0), (0, 1), 
(1, 0), or (1, 1)] with one for Bob [same lists for (bọ, b1)]. We have already listed these 
points, with a different labeling, in Table 1.1. In the Collins-Gisin representation, the 
extremal points read 


1|1 1] 1 0/0 
Po,0)0,0) = 1 |] 1} 1 | Pooop = 1 | 1) 16--> Panay = 0 || 0 | 0 
Iıı 00 0 70/0 

(2.27) 


where the completion of the list is left as Exercise 2.4. Having this list, the local polytope 
is fully determined, and one can set out to find the facets. Even in this simplest of cases, 
the analytical derivation is tedious, we give it in Appendix G.1 for a meaningful sub- 
polytope. The complete polytope has 24 facets (Froissart, 1981; Fine, 1982; Collins and 
Gisin, 2004): The expected 16 positivity facets and 8 facets defined by 
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I = Eoo + £o1 + E10 — E11 <2 
I?) = Eoo + Eo1 — Eio + Eu < 2 
I = Eoo — Eo, +E10 + Ei < 2 
I = —Eoo + Eo1 + E10 + E11 < 2 
I® = —Eoo + Eo1 + E10 + En = —2 
I = Eo — Eo1 + E10 +E11 = —2 
I = Eo + Eoi — E10 + E11 = —2 
I® = Eoo + Eo1 + E10 — E11 = —2 


where 
Exy = P(a = bjx, y) — P(a £ blx,y). (2.28) 


It is manifest that these are just versions of the same inequality up to relabelings: 1? 
is obtained from J“ by permuting the labels of Bob’s inputs, [@ is obtained from J? 
by permuting the roles of Alice and Bob (or by permuting the labels of both Alice’s and 
Bob’s inputs), etc. Due to the symmetry of the expression, we have only 8 instead of 
the potential 2M 4!Mp!(m,!)™4 (mp!) ™E = 128 versions. Thus, up to relabelings, there is 
only one tight inequality in the CHSH scenario, namely the CHSH inequality 


S = Eoo + Eo1 + E10 — E11 < 2. (2.29) 


This is the same CHSH expression (1.3) that we derived in section 1.3. There, we took 
advantage of the smart labeling a, b € {—1, +1} to write (axby) = Dab abP (a, b|x,) = Exy. 
The current derivation, with Ex, defined by (2.28), highlights that any labeling of the 
outputs is allowed, as indeed it should be. 


2.5.2 The CH form of the CHSH inequality 


Among the infinitely many forms of the CHSH inequality equivalent under no-signaling, 
the most frequently encountered one is 


1 
ScH = X (—1)°P(0, 0|x,y) — P(a = 0|x = 0) — P(b= Oly = 0) < 0, (2.30) 
x,y=0 
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called CH form because it was introduced by Clauser and Horne (1974). To prove the 
equivalence, one can plug into (2.29) the expression 


Exy = 4P(0,0|x,y) — 2P(a = 0|x) — 2P(b= Oly) +1, (2.31) 
that follows from the relations 


P(0, 1|x,v) = P(a = 0|x) — P(0, 0|x,y), 

PC, 0|x,y) = Pb = Oly) — PO, 0|x,y), 

P(1,1|x,y) = 1 — P(0, 0|x,y) — P(0, 1|x,y) — P(1, 0|x,y) 
= 1 — P(a = 0|x) — P(b = Oly) + P(O, O|x,y). 


The CH expression contains exactly the terms that appear in the Collins-Gisin repre- 
sentation. If the behavior P is written in that representation, one can conveniently write 
the Bell expression as a term-by-term multiplication of tables: 


—1 | 0 
Sco =Icu:P, with Zcg= -1]| 1 1l (2.32) 
(0) 1 —1 


By the further replacements P(a = O|x = 0) — P(0, 00, 1) = P(0, 1|0, 1) and P(b = 
Oly = 0) — P(0, 0/1, 0) = PC, 0|1, 0), ScH can be rewritten in the Eberhard form 
(Eberhard, 1993) 


SEbe = P(0, 0/0, 0) — P(O, 1/0, 1) — P(1,0|1,0) — P(0,0|1,1) < 0. (2.33) 


The numerical values of the three expressions are related through 


be a 
Scu = Seve = GS-5: (2.34) 


A historical remark must be made at this point. Clauser and Horne presented (2.30) as 
a new inequality, which is perfectly understandable for those early days. Unfortunately, 
before the notions of no-signaling and of the local polytope were fully formalized, the 
belief that CHSH and CH are inequivalent was reinforced by a misunderstanding in 
the approach to the detection loophole;® as received knowledge, it has persisted in the 
literature till very recently. 


8 ‘The first studies of the detection loophole addressed it correctly with the CH form, but somehow failed to 
see how to address it with the CHSH form. As a result, the conviction spread that the latter is simply unsuited 
for the task. In Appendix B.3 we show how to approach the detection loophole both with the CHSH and the 
CH form, and obtain the same result. 


40 Formalizing Bell Nonlocality 


2.6 Proper Statistics: Beyond i.i.d. and Finite Samples 


In this last section, we prove that the local polytope remains unchanged if the underlying 
strategy is not 1.i.d., then discuss some subtleties that one should take into account for a 
proper statistical assessment of a Bell test. 


2.6.1 Non-i.i.d. behaviors and the local polytope 


In this chapter, we have started from a definition of behaviors (2.5) that assumes an 
i.i.d. strategy. We have built the local polytope from the corresponding definition of i.i.d. 
local behaviors. The i.1.d. assumption must be removed: As mentioned in subsection 
1.5.1, if the players could pass a Bell test by just adopting a local non-i.i.d. strategy, Bell 
nonlocality wouldn’t be worth much discussion.? 

For behaviors that describe an asymptotically large number of rounds, we are going 
to prove that the definition of the local polytope does not change if one relaxes the 
i.i.d. assumption. For simplicity of notation, we consider processes À that correlate two 
consecutive rounds, but the proof extends easily to any number of rounds. If two rounds 
are known to be correlated, the proper description of the behavior isl P(ay, a2, b1, b2|x1, 
X25 Y1, Y2). Given this, the single-round statistics can be computed through 


1 
Pladsxy)= 5] J, Plasarsbs baleyx2sy¥2)P( x22) 


425b2,X2 52 


+ 5 P(Q1,4, 61, 6|x1 5X5 15V)P(X15%5159) 
41561 5X1591 


where the factor 1 comes from the normalization )> P'(a,b,x,y) = 1. Thus, the 


effective single-round behavior is 


a,b,x,y 


P'(a, bix, y) = P' (a,b, x,y) /P' (x9), (2.35) 


whereP (x39) = 3 [x,y P@x2992) + Dri yy PO1%I1,9)]. Now, suppose that the 


two-round behavior P(a1, a2, b1, b2|x1, X2, Y1, y2) is local: By Fine’s theorem, it is a 
convex mixture of local deterministic two-round behaviors. But, any LD two-round 
behavior is the product of two LD single-round behaviors (the fact that the two rounds 
may not be independent simply defines the rule with which each single-round behavior 
is chosen). Thus P’ (a, b|x, y) is a convex mixture of LD single-round behaviors, which 


9 Still, it took almost forty years after Bell’s paper for this matter to be properly addressed by Richard Gill 
[as early as 2001, see references in (Gill, 2014)] and (Barrett et al., 2002). 


10 Tt should be clear that, in this subsection, a; does not mean a,—;, but the output of Alice in the j-th round. 
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means it’s local. A contrario, if P' (a, b|x, y) were nonlocal, P(a1, a2, b1, b2|x15 X2, Y1, Y2) 
must have been nonlocal too. 

Notice that the proof does not requires the rounds to be sequential: The result holds 
even in the case of parallel queries, where the verifier sends all the inputs to the players 
and elicit all the outputs. In fact, the critical assumption is measurement independence, 
that we assume systematically till the last chapter (and we’ll see in subsection 11.4.3 that, 
in the presence of measurement dependence, a local two-round behavior may lead to a 
nonlocal effective single-round behavior). 


2.6.2 Certifying nonlocality on real data 


Asymptotic statements like the previous one provide a simple working ground for 
theorists. But real data always come in finite amount, so one has to discuss fluctuations. 
Among pioneering work, one must cite that of Richard Gill, synthetized in (Gill, 
2014), and the first dismissal of the memory loophole by (Barrett et al., 2002). Several 
improvements were reported around the time of the loophole-free Bell tests, we refer to 
(Zhang et al., 2013; Elkouss and Wehner 2016) for a first introduction. These tools are 
still being refined at the moment of writing: For instance, a Bayesian approach under the 
i.i.d. assumption has been provided in (Gu et al., 2019). 

For the level of this book, it is worth pointing out one peculiarity: In the case of 
nonlocality, unprocessed raw data fail to provide a point estimator. The reason is that the 
local polytope is by construction a subset of the set of no-signaling behaviors, but the 


p 
(J 
p Nonlocal 
NS 
Local 


Bell 
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Figure 2.2 If observed frequencies are taken as estimators of probabilities, the estimated behavior does 
not satisfy the no-signaling constraint rigorously. Bell inequalities are defined only in the no-signaling 
set (NS), because so 1s the local polytope (L): Thus, plugging frequencies into one’s favorite form of a Bell 
inequality 1s, strictly speaking, ambiguous. This 1s illustrated in the figure: The estimated signaling 
behavior P' (respectively, P") is manifestly a fluctuation of a nonlocal (respectively, local) behavior. 
However, if the inequality is written as B, P' does not violate it while P” does. 
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estimator of the behavior obtained from frequencies over a finite sample will deviate from 
the strict no-signaling condition. Plugging such an estimator into the linear program 
(2.22) will always return a negative result. It’s even worse if one plugs the signaling 
estimator into their favorite form of a Bell inequality!! (subsection 2.4.4): Given a 
signaling behavior and a Bell inequality as facet of the local polytope, one can always 
find both a form of the inequality that is violated by the behavior and a form that is not 
(Figure 2.2). If one wants a point estimator, one first has to project the estimator obtained 
from the frequencies into the no-signaling set (Lin et al., 2018). The error estimator for 
this procedure is not known at the moment of writing. 


EXERCISES 


Exercise 2.1 Construct an example of a no-signaling behavior as a coarse-graining of 
signaling processes. 


Exercise 2.2 Prove that, in any Bell scenario, deterministic behaviors are either local 
deterministic or signaling. 


Exercise 2.3 Alice and Bob share some resource prior to the Bell test. Suppose that Alice’s 
information can be perfectly copied (while we make no such assumption on Bob’s). Prove that 
the resulting behavior 1s local. 


Exercise 2.4 


(a) Complete the list (2.27) by writing down all sixteen LD behaviors in Collins-Gisin 
representation. 

(b) For each, compute the value of the CH form (2.30). Compare it with the CHSH value 
given in Table 1.1 and provide a geometric interpretation of this observation. 


11 Unfortunately a large number of experimental papers have done exactly that over the years. I do not 
doubt that the data could have supported nonlocality (maybe up to loopholes), but the claim is not strictly 
substantiated. 
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Bell Nonlocality in Quantum Theory 


To avoid entanglements and interferences had long been one of her 
first principles. 
C.S. Lewis, That hideous strength 


While Bell nonlocality is defined and formalized without reference to quantum theory, 
it is this theory that describes accurately the known no-signaling nonlocal resources. This 
chapter introduces the study of nonlocality with the quantum formalism. 


3.1 Quantum Behaviors 


3.1.1 Quantum and nonlocality: A preamble 


In subsection 1.2.3 we made the point that Bell nonlocality is intriguing because it can be 
demonstrated with some no-signaling resources—or more precisely, resources that are 
accurately described by a formalism that implies no-signaling: Quantum theory. 

In order not to disrupt the flow of the study of nonlocality, I won’t pause in the main 
text to introduce this theory. Many readers will likely be very familiar with it. Elementary 
knowledge at the undergraduate level is assumed, while the more advanced notions are 
summarily introduced in Appendix C with references for further study. 

Quantum theory provides a further very important insight. Nonlocality is not asso- 
ciated to specific physical degrees of freedom (the charge of the electron, the hyperfine 
constant, the frequency of a light mode ...). What matters is the possibility of creating 
and measuring suitable states of composite systems, described as entangled by the theory. 
As long as one can manipulate those states, any physical degrees of freedom goes. 


3.1.2 Definitions 


In quantum theory, the shared resource is described as a quantum state shared among 
the players. A behavior P is obtained by each player performing a local measurement on 
their subsystem. 

The joint system of Alice and Bob is described by the tensor product Hilbert space 
Ha ® Hp. A state p is described by a positive Hermitian operator of trace 1. A m4-output 
measurement on Alice’s system is described by M4 = {IIZ|a € A}, where the IM% are 
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positive Hermitian operators on H4 satisfying), IX = I. A mg-output measurement on 
Bob’s system is described by M} = {IT |b € B}, where the is are analogously defined. 


Notice that the measurements do not need to be projective: They can be generalized 
measurements (a.k.a. positive-operator valued measures, POVMs). 

A behavior P is called quantum behavior if there exist a state p and families of 
measurements {M*,|x € X} and {Maly € V}, such that 


P(a,b|x,y) = Tr(pIly ® TĘ). (3.1) 


In chapter 2, from the definition of local behavior (2.16) we derived a compact 
characterization of the set of all such behaviors, namely the local polytope. The discussion 
of the set of all quantum behaviors is more complex. Although it could fit here, I will 
postpone it to chapter 6: I prefer the reader to first become familiar with several examples 
of nonlocality in quantum theory. 


3.1.3 Relations between states, measurements, and behaviors 


Bell nonlocality is defined on behaviors, while quantum theory is phrased in terms of 
states and measurements. Here we spell out some evident but important consequences 
of this difference. 

The states are the shared resources. In any Bell scenario, given a state, infinitely many 
behaviors are possible because of the freedom of choosing the measurements. We can 
introduce here the notion of state-behavior as the collection of the statistics of all possible 
local measurements on p: 


Pop) = [P(a, blä, Ë) = Tr(pT1g(4) Q (Ë) la € A, b € B, Va, b| (3.2) 


where a,b parametrize the measurements. In principle, this parametrization should 
include POVMs, but often for simplicity one restricts state-behaviors to projective mea- 
surements (to practice this notion, see Exercise 3.1). States that are equivalent up to local 
unitaries produce the same state-behavior. Indeed, if ð = Up Ut with U = Uy ® Up, 
then Tr(pM* @ TĘ) = Tr(pllx @ i) where Ñ* = UAN Us, and IT; = UBICU}. 

To prove that p is a nonlocal resource, it is sufficient to find the families of measure- 
ments that generate a nonlocal behavior. Conversely, to prove that p is not a nonlocal 
resource, one should prove that all the choices of measurements lead to behaviors that 
can be simulated with LVs. Thus, the assessment of nonlocality for a quantum resource 
p involves a priori a heuristic step: Finding the suitable measurements or constructing 
an LV model. That being said, the nonlocality of two classes of states has been clarified 
with generic arguments: 


e It follows immediately from the expression (3.1) that separable states can gen- 
erate only local behaviors. Indeed, inserting p = 2 aRP% ® pk into (3.1) one 
obtains P(a, b|x,y) = >, qe Tr(p%T1*)Tr(p2T), which is of the form (2.16) for all 
measurements. 
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e We shall see in section 3.2.4 that all pure entangled states are nonlocal resources. 
The proof is constructive and provides a choice of measurements that generate a 
nonlocal behavior (without any claim of optimality, in any sense one may want to 
give to this word). 


As for the nonlocality of mixed entangled states, we shall review the approaches and main 
results in section 3.3. 

Let us now turn to the properties of the measurements. If a player uses a set of jointly- 
measurable! POVMs, it will be possible to construct the joint probability distribution of 
all their outputs. In this case, that player can’t contribute to a nonlocal behavior as proved 
in subsection 2.3.4. Conversely, in order to create a nonlocal behavior, it is a necessary 
to use non-jointly-measurable POVMs. One may ask whether this is also sufficient, but 
the answer is negative. Indeed, there are explicit examples of non-jointly-measurable 
POVMs for one player, such that the resulting behavior will be local, irrespective of the 
number and choice of measurements of the other player (Hirsch et al., 2018; Bene and 
Vértesi, 2018). 


3.2 CHSH in Quantum Theory 
3.2.1 The CHSH Bell operator 


We shall use the notation x, y € {0, 1} for the inputs and a, b €e {—1, +1} for the 
outputs, and work with the standard form (2.29) of the CHSH inequality. The correlation 
coefficients for given inputs (2.28) are now expressed as 


Ey = Tr[p (i, @ TP, + @ IP, - 1%, @ o @IF,,)] 
= Tr[o(M., — H%,) @ UP, —IP.))| = Tr[ pA, @ By], (3.3) 


where A, = II}; — Tt, and By = Ney -m , are Hermitian operators of unspecified 
dimensions, whose eigenvalues lie between —1 and +1, and are equal to these values in 
the case of projective measurements. The value of the CHSH expression (2.29) in state 
p can be seen as the average value S = Tr(pS) of the CHSH operator 


S = Ao ® Bo +40 ® Bı +A; ® Bo — A1 @ By. (3.4) 


In particular, a quantum state p violates CHSH if and only if there exist measurements such 
that Tr(pS) > 2. 

Before continuing, we stress that a Bell test does not consist of single-shot measure- 
ments of a Bell operator. The Bell operator is self-adjoint, and as such it constitutes 


1 The textbook equivalence between joint measurability and commutation holds only for projective measure- 
ments. For POVMs, commutation implies joint measurability but the converse is not true: There exist POVMs 
whose elements do not commute but are nonetheless jointly measurable (more in Appendix C). 
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a valid quantum observable. But it cannot be measured by separated players because 
(as can be expected and we shall prove in the next subsection) some of its eigenstates 
may be entangled. The measurements of the players are still described by the POVM 
elements I% and Ty The operators Ay, the By and S are derived from those and used 
for mathematical convenience. 


3.2.2 The Tsirelson bound 
We proceed now to prove the famous Tsirelson? bound (Cirel’son, 1980): 

S< 2/2 for quantum behaviors. (3.5) 
The bound is derived as an upper bound for the largest eigenvalue of S in absolute 


value, ||S||o. = maxy |(wSw)|. A very direct proof follows from noticing? that S can be 
rewritten as 


1 2-1 
S= we (4? +4? + Be + B?) — ae [positive operator], (3.6) 


where the positive operator is given by 


2 2 
(2+ Do- Bo) +41 - Bi) + ((V2-+ Do - Bi) -41 - Bo) 


2 2 
+ (v2 + D41 - Bo) + Ao + Bi) + ((Vv2+ D41 + Bi) Ao — Bo) l 


Since the second term of the sum is non-positive, we have ||S]|oo < || Z (A? +A? + B? + 


BY hes The bound (3.5) follows from ||Ax|loo < 1; ||Bylloo < 1, through the triangle 
inequality ||4 + Bllæ < ||Alloo+||Blloo and the fact that ||A?||oo = ||Al|2, for normal 
operators. 

A more elegant proof can be provided assuming that the measurements are projective. 
In fact, no generality is lost in this assumption, because the dimensions d4 and dg of 
the Hilbert spaces of Alice and Bob are left unspecified: By Naymark’s theorem, any 
POVM ina given dimension can be implemented as a projective measurement in a higher 
dimension. For projective measurements, the eigenvalues of the A, and By are exactly +1 
and —1 (degenerate if d4, dg > 2); whence ||Ax||oo = ||Bylloo = 1,42 = Iu, and B? Sg. 
With these equalities, the square of the CHSH operator reads 


S? = 414, ® la, — (Ao, 41] ® [Bo B1]. (3.7) 


2 The romanization of this name was changed from Cirel’son to Tsirelson in the 1980s. 
3 Tt takes a Tsirelson to notice this. 
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Besides, 
£40541] | loo = [|4041 — 41áo0llæ < [|A0A1|loo + IlA1Aolloo < 21|AolloollA1|loo = 25 


where for the last estimate we have used the inequality ||A9A1|loo < ||Aollool|A1Iloo- 
Similarly one finds ||[Bo,By]||oo < 2. The triangle inequality leads then to ||S?||oo < 8, 
whence (3.5) follows. 

Besides being more elegant, this second proof makes manifest that [4o, 41] = 0 or 
[Bo, B1] =0 willimply S < 2 (a special instance of the fact that joint measurability leads to 
local behaviors). It is also the starting point to prove that the state and the measurements 
that achieve (3.5) are essentially unique (this “self-testing” character will be presented 


in section 7.1). 


3.2.3 CHSH for two-qubit states and projective measurements 


The canonical example of violation of CHSH is that by the maximally entangled state of 
two qubits. We have already written such a state in (1.5). Up to local unitaries, it can be 
cast in the form of the singlet state 


o 1l 


wes 


(l+) |=) — |-A)|+A)) (3.8) 


where the choice of the direction 7 that defines the bases does not matter because the 
state is invariant by bilateral rotation. Projective measurements on a qubit are labeled 
by a direction m€S*, the surface of the unit sphere embedded in R*. Specifically, 
M®°® = ay kee Ba with M® = 5(I+sm-o) and where o are the Pauli matrices.* 
A standard calculation (Exercise 3.2) shows that 


R n A 1 2 
P(a,b|4, 6) = cw" |mg@ nfl >) = z0 -abâ-b). (3.9) 


This means that the correlation coefficient (2.28) isE; = —â. Í. In the context of a 


Bell test, for the sake of not carrying on a bothersome minus sign, there is no harm in 
assuming that Bob systematically flips his output, or that he uses a reference frame that 
is related to Alice’s by parity inversion. So, starting from a singlet state, one can generate 
a behavior in which Eyy = âx - by, and the CHSH expression reads 


SOT) = âo: (bo +61) + a1 - (60-61). (3.10) 


4 Having used x, y as inputs of Bell tests, I shall use the notation x etc., when a direction in space is meant. 
For instance, the Pauli matrix that is usually written ø x will be written oz. 
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Let’s proceed to find the measurement directions that maximize this value. We note 
that the sum and difference of two unit vectors are orthogonal, so we can write 
bo + by =2cos x¢é and bo — bi =2sin xé1. The parameter x is defined by bo- = = cos 2x. 
Given this observation, obviously the best choice to maximize S(W~) is @g=¢ and 
a, =¢!. With this choice one has S(W~)=2(cos x + sin x). Finally, we can maximize 
this expression over x to find S(W-)=2,/2 for cos x= sin X = "A So, the maximally 
entangled state of two qubits saturates the Tsirelson bound (3.5) by generating the 
behavior 


P(a, b|x, y) = (ita y=). (3.11) 


That gives the correlators Eyy = (—1)” Z Working back through the optimization, we 


can find the measurement directions that must be chosen to reach that maximum. One 
of them, say Go, is arbitrary, since the state is invariant by rotation; then â; is any direction 
orthogonal to ao. Having thus chosen Alice’s measurement directions, Bob’s are given 
by bo = Jz ĉo +â) and 6; = zz ĉo — â1), so they are also orthogonal. 

As it turns out, with not too much additional effort, one can derive the maximal 
value of the CHSH expression for any two-qubit state, pure or mixed, under projective 
measurements (Horodecki, Horodecki, and Horodecki, 1995). Indeed, any two-qubit state 
is of the form 


1 os, my 5 Oy h; 
p=Z|1OI+%)-F @I+1@5)-G+ X Togog]. (3.12) 


13,]J=X; Y; Z 


With this notation, Exy = âx - Tp by where 7’, is the 3 x 3 matrix with entries Ty: so the 
CHSH expression reads 


S@) = âo- T, (0 +61) +â1: T (60-61). (3.13) 


Since ||âo|| = 1, the maximal value of âo - Tpĉ is ||Tpĉl|, obtained when dg is chosen 
parallel to T,,¢. By the same reasoning on the term involving 41, we reach 


max S(p) = max 2 (cos x||T él 4 sin xITp¢+|l) 


40541 360561 bobi 


= max 2/I|Tpĉll? + ||T,241/2 
GC 


where in the last step we used the well-known optimization max, xcos x +ysinx = 
/x? +? achievable with the choice cos x = x/./x? +2. Finally, [|To¢||? = ĉ- T Tê 
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where T; the transpose of Tp. Now, TT, is a positive symmetric matrix, therefore 


the maximization is achieved by choosing ĉ and ¢+ as the two eigenvectors associated to 
its two largest eigenvalues A; and Az: 


max S(p) = 2/A1+A2. (3.14) 


4954569561 


It is instructive, and useful for what follows, to work out the case of pure two-qubit states 
more explicitly. By the Schmidt decomposition, any two-qubit pure state is equivalent up 
to local unitaries to 


| (@)) = cos |+8)|+4) + sind |-4)|-2),0 €[0, Z]. (3.15) 


Thus, 


sin 20 0 0 
UT) = 0 —sin20 O]. (3.16) 
0 0 1 


The maximal achievable value of CHSH is 


max  S(W()) = 2y 1 +sin* 26, (3.17) 


40, 41,69, 61 


which is always larger than 2, unless sin26 = 0. Therefore, all pure entangled states of 
two qubits violate CHSH for suitable measurements, and only maximally entangled ones 
saturate the Tsirelson bound. Let us determine the corresponding measurement settings. 
The eigenvector associated to the largest eigenvalue of TT, is ¢ = 2; the orthogonal 
subspace being degenerate, we can choose any vector in it as ¢+; in other words, there 
are many different choices of measurement directions leading to the maximal violation. 
Let us choose ¢+ = £. With this choice, 


bo,1 = cosx2+sin x2 with cos x = 1//1+sin? 20. (3.18) 

Furthermore, ag must be the unit vector parallel to T,ĉ, and â; to Ter, so here 
a = 8, a) =X. (3.19) 
In summary, Alice’s measurement directions are orthogonal, a being the direction that 
defines the Schmidt basis (2 in our convention); Bob’s two measurement directions are 


not orthogonal, but are symmetrical around the same direction and lie in the same plane 
as Alice’s. 
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3.2.4 All pure bipartite entangled states violate CHSH 


As an extension of the previous, let us now prove that all bipartite pure entangled state 
(i.e., of any dimensionality) violates CHSH. Any d-dimensional bipartite pure state can 
be written in its Schmidt decomposition |W) = Da ch |k) |k), cg € Rt, where the bi- 
orthogonal basis is chosen such that cy > c2 >... > cq > 0. The state is entangled if and 
only if c2 40. Let us rewrite the state as 


[d/2] 

Iw) = pa Vj (cos6; |2j — 1) |27 — 1) + sin6; |27) 27) 
j=1 
[d/2] 


=> pwe) (3.20) 
j=1 


where p; = A + 6555 cos 6; = a and sin; = F Gf d is odd, sin 6faj2] X cg41 = 0). 
J J 


From the previous section, we know that CHSH can take the value? 


[d/2] 
S(W) = 5 pj2y 1 + sin? 26; (3.21) 
j=l 
with measurements of the form 
[d/2] [d/2] 7 
Ax = @ âsi By= Q byrj (3.22) 
j=l j=l 


and the suitable choice of unit vectors in each subspace. Since b; pj =1, SCY) > 2 as 
soon as c2 > 0. 

All in all, we have proved that all pure bipartite entangled states are nonlocal resources. 
This is often referred to as “Gisin’s theorem,” since Nicolas Gisin was the first to ask 
the question and to answer it for bipartite states (Gisin, 1991) with a proof similar to 
the one we have just given. Shortly after, Popescu and Rohrlich (1992a), who were 
working independently on the same line, extended the proof to multipartite states (see 
section 5.1.2). 


5 For the current purposes, a lower bound is sufficient. The maximal violation of CHSH by an arbitrary 
pure state is not known analytically beyond the case of two qubits; however, there is strong numerical evidence 
that the lower bound used here is actually tight. 
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3.3 Mixed Entangled States as Nonlocal Resources 


While all pure entangled states are nonlocal resources, the case of mixed entangled states 
is more complex. This is not surprising, since the same happens in entanglement theory. 
At a glance, the situation is as follows. 

As first proved by Reinhard Werner (1989), some mixed entangled states do not 
exhibit any single-copy nonlocality: That is, any quantum behavior (3.1) obtainable by 
measuring them is local. The original example is presented in subsection 3.3.1; a few 
other families of such states have been found since then, but we don’t have any general 
criterion [see (Augusiak et al., 2014) for a review]. 

However, this is not the end of the story. If the players share a source that produces 
the entangled state p, which is local for single-copy measurements, they can try to create 
a nonlocal resource by many-copy processing. Subsection 3.3.2 reviews these protocols. 


3.3.1 Single-copy nonlocality: Werner states 


Consider the single-parameter family of two-qubit states (“Werner states”) defined as 


pw= Waa- Wy, we|-5 i} (3.23) 


The state-behavior that describes the statistics of all possible projective measurements 
on pw is 


K 1 A A 
Po(W) = {Pla blâ, 6) = 71 —abWG-6)|a,b <{-1, +1}, â, Ê € s2 l. (3.24) 


Using the criterion of negative partial transposition, it can be proved that Werner states 
are separable for W < 3 and entangled otherwise (see Appendix C). Nonetheless, the 
behavior Po(W) can be reproduced with LVs for W well within the range of entangled 
states. Currently, we know that a LV model is possible for W < 0.6829 (Hirsch et al., 
2017) while nonlocality is guaranteed for W > 0.7056 (Vértesi, 2008), only slightly 
better than the bound is immediately derivable from CHSH.® 

As a proof, we present a simple rendering of Werner’s original argument by Popescu 


(1994), which covers the range W < L. It is sufficient to find the LV model for W = a 
since any py with W < 5 can be obtained by mixing py,_1 with white noise. In each 
may 


round, Alice and Bob pre-share a classical variable in the form of a vector i drawn from 
the surface of the unit sphere S? with uniform distribution p(ajda = x sin@dédg with 
the usual spherical coordinates. Alice draws her output by simulating the statistics of a 
measurement of a single spin prepared in the direction a: 


6 The value of the CHSH parameter for a Werner state is 2\/2 W, which implies nonlocality for W > 1 / V2 
0.7071. Vértesi proved that the reported minor improvement can be achieved, but one has to use inequalities 
with M4,Mp = 465 inputs per player! 
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På (ala) = (1424-3). (3.25) 


1 
2 
Bob outputs deterministically b = ~sign(b . d)s ie.,b=+1 ifb-} <0,b=-1 if6-A> 0). 
Thus 


TR Aan : 1 4 pe Sea 
P(a, +114, 6) = f Go) P aD 9 = G+ 54 foai (3.26) 


In order to compute the integral, we choose the spherical coordinates such that b is 2 
(i.e., 0 = 0): 


i 8 A 1 T 27 
i dip(aja-rA= real dé sino | dy [ (ag cosy + ay sing) sind + azcos6 | 
b-1<0 An Jn /2 0 


1 g . 1 
= -aş d0 sinô cos@ = ——az. 
2 Ia /2 4 


Inserting this result into (3.26) and recalling that a; = â- b, we recover indeed (3.24) for 
b = +1. The calculation for b = —1 changes only in the bounds of the last integral and 
yields the desired result too. 

The state-behavior (3.24) does not cover all possible statistics that can be obtained 
by measuring pw, because it is restricted to projective measurements. The idea of 
adding POVMs to the state-behavior was first introduced by Barrett (2002), who 
found a LV description up to W < a x~ 0.42. At the moment of writing, this has been 
extended up to W < 2 x 0.6829 ~ 0.455 (Hirsch et al., 2017; Oszmaniec et al., 2017). 
It is not yet known whether POVMs actually lead to nonlocal behaviors in the range 
0.455 < W < 0.6829. 


3.3.2 Many-copy nonlocality 


We start with the result that is simplest to state: There exist states p that have a LV 
model, but such that p®% is a nonlocal resource without further processing. This has 
been called super-activation of nonlocahty. It was first found in network configurations 
.e., with the N copies of the bipartite ọ shared among more than two players), but the 
most elegant result proves it for only two players (Palazuelos, 2012). It is slightly too 
technical to be presented here, and the reader can refer to the original reference that is 
self-contained. 
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Now, if Alice and Bob share p®N, they can submit this resource to some processing: 
Any processing performed before receiving the inputs is valid, as it constitutes merely the 
preparation of a new resource. Two types of interesting processes’ have been considered. 

The first is entanglement distillation: If the state p is distillable, then, given sufficiently 
many copies, with local operations and classical communication Alice and Bob can come 
arbitrarily close to sharing a maximally entangled state. Thus, any distillable state is 
a nonlocal resource for many-copy processing (Peres, 1996): In particular, so are all 
entangled Werner states. As for non-distillable states, their nonlocality has been elusive, 
with many negative results leading to the conjecture that no such state could be a nonlocal 
resource (“‘Peres’ conjecture”). Eventually, Vértesi and Brunner (2014) refuted this 
conjecture by providing an example of a bound entangled state that is nonlocal in a 
single-copy setting. 

The second is filtering, also known as “revealing hidden nonlocality” from the title of 
the paper that introduced it (Popescu, 1995). Considered the Werner states for higher 
dimension 


n Ia 
d2 


(d) __ 1 2Ianti 


P= ——— +(1-W 
Pw Ja- 


(3.27) 


where [anti is the projector on the antisymmetric subspace. Werner (1989) had proved 
that these states are local for projective measurements if W’ < IL, The filtering we 
consider is the projection, both on Alice and on Bob, in a two-dimensional subspace: 
Fa = Fg = |0) (0| + |1) (1|. The unnormalized state resulting from a successful filtering 


reads F4Fpp\ FaFp = W’ eel +(1-— W’) 3: The normalized state is therefore a 


two-qubit Werner state (3.23) with 


yee, (3.28) 


d-1 1—-W’ 
142° Wy 


Let us then set W’ = oh, The initial two-qudit state is local, but the filtered state violates 
CHSH if d > aa ~ 4.8 (i.e., for all d > 5). Notice that the filter is applied to each copy 


separately, but when it fails that copy is discarded: Thus, as a preparation of a resource, 
it is indeed a many-copy processing. 

Popescu’s proof considers initial states whose locality is proved only for projective 
measurements. Later, the activation of nonlocality through filtering was also proved 
for initial states, whose state-behavior is local for the most general local measurements, 
including sequential ones (Hirsch et al., 2013). Also, an extension of Popescu’s hidden 
nonlocality idea provides the only example known to date, in which any bipartite 
entangled state leads to nonlocality: If p is a bipartite entangled state, there exists an 


TA process allowed by quantum theory is: Throw the state away and replace it with a better one. Obviously, 
it would work ... 
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entangled bipartite state p/ that does not violate CHSH even after filtering, such that 
p ® p/ violates CHSH after filtering (Masanes, Liang, and Doherty, 2008). 


EXERCISES 


Exercise 3.1 Prove that the state-behavior associated to all projective measurements on a 
pure, non-maximally entangled two qubit state (3.15) is 


SENEN 
P(a, |, b) = [1 +c(aaz + bbs) + ablazbs + slagbg — aybs)]} (3.29) 


where c = cos 20, s = sin 20, and a,b are the directions associated to the measurements. 


Exercise 3.2 Derive the statistics (3.9) for the results of projective measurements on the singlet 
state. Hint: Though not necessary, you may exploit the fact that the singlet state 1s invariant 
under bilateral rotation. 
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Review of Bipartite Bell Scenarios 


Tout ne va pas aussi mal que si ¢a allait pire. 
Things are not as bad as if they were worse. 


Attributed to Coluche 


This chapter reviews the main examples of bipartite Bell scenarios beyond the (2, 2; 2, 2) 
scenario. 


4.1 Unanticipated Complexity 


Our knowledge of the (2, 2; 2, 2) Bell scenario paints a simple picture: Up to obvious rela- 
belings, there is only one Bell inequality, CHSH; its maximal violation in quantum theory 
is rather easily computed, and is achieved by suitable measurements on the maximally 
entangled state of two qubits. One may hope that other Bell scenarios can also be studied 
thoroughly with relative ease and confirm the simplicity of the picture. Unfortunately, 
that is not the case. Here is a sample of what may, and actually does, happen: 


e When moving away from (2, 2; 2, 2), the number of inequivalent Bell inequalities 
grows quickly, and their quantum bounds are usually not as straightforward to 
obtain as the Tsirelson bound. 


e A frequent abuse of language, inspired by the CHSH intuition and perpetuated by 
the unfortunate title of some pioneering papers, consists in calling “Bell inequalities 
for qudits” the inequalities of a Bell scenario with m=d outputs per player. The 
implicit idea is that projective measurements on d-dimensional systems should 
exhaust what quantum theory can achieve with d outputs. However, this proves to 
be wrong in general: The maximal violation of several inequalities requires systems 
with dimension strictly larger than the alphabet of the outputs. 


e For several inequalities, the maximal quantum violation is achieved by measure- 
ments on a non-maximally entangled state. This is arguably the most counter- 
intuitive discovery in the quantum theory of Bell nonlocality.! 


1 Tt is known from entanglement theory that one can deterministically create any state starting from the 
maximally entangled one, using local operations and classical communication. Thus, if the players can produce 
the maximally entangled state, they can always process it into the state of their choice prior to receiving the 
reviewer’s queries. 
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e Bell scenarios with larger alphabets contain all those with smaller alphabets; but, 
it is not a priort known if, in order to answer a given question, considering larger 
scenarios is useful or even needed. For instance, the nonlocality of arbitrary pure 
states can be assessed with CHSH and its simple variations (section 3.2.4), and no 
significant improvement on the CHSH threshold for the detection loophole was 
found (Appendix B.3). New inequalities may be “relevant” if they are violated 
by some mixed states, whose nonlocality was missed by all previously known 
inequalities. Only a few such examples are known? and there seems to be no 
systematic way of addressing this question. 


In what follows, a large number of results are merely stated. In some cases, the proofs 
are very specific to each case and the interested reader can directly refer to the relevant 
article. Other results can be derived with tools that I have chosen to present in Part II 
of this book, namely the Navascués-Pironio-Acin (NPA) hierarchy of approximations 
(chapter 6) and the techniques of self-testing (chapter 7). 


4.2 Several Inputs, Two Outputs 


Among the Bell scenarios (M, 2; M, 2) with M > 2, the local polytope has been fully 
solved only for M = 3 (see next) and M = 4 (Zambrini, Cruzeiro, and Gisin, 2019). For 
M=5 up to 10, large incomplete lists of inequalities can be found in the literature, but 
to the best of my knowledge none is worth stressing in this book. 


4.2.1 The (3, 2; 3, 2) scenario and the inequality {3322 


The local polytope for the (3, 2; 3, 2) Bell scenario was first solved by Froissart (1981), 
then again several years later (Pitowsky and Svozil, 2001; Sliwa, 2003; Collins and Gisin, 
2004). It has 684 facets. We expect to find 36 positivity facets. We also expect a certain 
number of liftings of CHSH: Specifically, Alice and Bob have each 3 ways to choose 
two inputs out of three, and the relabeling symmetries will generate 8 versions for each 
such choice, so we expect 3 x 3 x 8= 72 CHSH-like facets. Indeed, all these are found. 
The remaining 576 non-trivial facets are all equivalent, up to relabeling, to a single new 
inequality, denoted 73322. In the Collins-Gisin representation, one of its versions reads 


—ıļoļo 
oP 0 wih aa e e AE (4.1) 
Sit tdi 
0o 1 aslo 


2 The first was the inequality 13322 that is presented in subsection 4.2.1. In subsection 3.3.1 we have already 
mentioned Véertesi’s striking discovery that one has to go to M4, Mg = 465 in order to detect the nonlocality 
of some Werner states (Vértesi, 2008). 
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There is a lot to say about this simple inequality. First, contrary to CHSH, it cannot be 
rewritten in a way that involves only the correlation coefficients E,,—in compact jargon, 
it is not a “correlation inequality.” It is mildly surprising that an optimal criterion for a 
Bell test takes into account marginal information in a non-trivial way, given that what is 
hard for the players to generate are the correlations, not their local biases. 

Much more surprising is the behavior of the maximal quantum violation. With 
measurements on two-qubit states, at most 73302-P = ; can be achieved, for suitable 
measurements on the maximally entangled state (Moroder et al., 2013). Extensive 
numerical searches did not find any improvement using states of two qudits up to 
d=11—and suddenly, for d= 12, higher violations are found, approaching the upper 
bound 73322 -P X 0.25087538 set by the NPA hierarchy (Pal and Vértesi, 2010). An 
analytical optimization is still elusive at the moment of writing, but it is conjectured that 
the maximum be reached only in the limit d > oo. 

Further, Collins and Gisin (2004) reported an example of a family of mixed states 
that certainly don’t violate CHSH (as this can be tested using the criterion described in 
subsection 3.2.3) but that do violate 73322. It is the first example beyond CHSH of the 
relevance of a Bell inequality for the detection of the nonlocality of quantum states. As the 
reader probably expects by now, nobody has been able to parametrize all the states that 
violate 13322, not even for two qubits. This limitation hinders the study of the relevance 
of inequalities with even larger number of inputs. 


4.2.2 The Mermin outreach criterion 


In subsection 1.4.1, we presented Mermin’s outreach criterion, a Bell test in the 
(3, 2; 3, 2) scenario relying strongly on some correlations being perfect. The behavior 


1/2 | 1/2 | 1/2 
1/2 || 1/2 | 1/8 | 1/8 
1/2 || 1/8 | 1/2 | 1/8 
1/2 [1/8 | 1/8 | 1/2 


P= 5 (4.2) 


for which indeed P(a;=6;)=1 and P(a;=b;4;) =1/4, gives O=4.5, which can be 
proved to be the quantum minimum (Exercise 6.2). This behavior may be obtained 


by measuring the state |t) along the directions a, = êi =3,a@= b> = Bg — 48 and 
a; = îs =-Be—1e. 

It is straightforward to check that this behavior does not violate the version (4.1) of 
13322, but this may be simply due to a mismatch in the choice of the labeling: one should 
check if any of the 576 versions is violated. As it turns out, the behavior (4.2) does not 
violate any version of 13322, while it violates some versions of CHSH without achieving 
the Tsirelson bound (Exercise 4.1). Thus, in a study based on the local polytope, nothing 
similar to the Mermin outreach criterion appears as a tight inequality, nor does the 
behavior that maximises its violation play any special role. 
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4.2.3 Chained inequalities 


Another family of inequalities that was proposed on intuitive grounds are the so-called 
chained inequalities, one for each (M, 2; M, 2) Bell scenario (Pearle, 1970; Braunstein and 
Caves, 1990). The most transparent form of the chained inequality for M inputs is 


Cm = Pay = b1) + P(b) = a2) + Plaz = b2) +... 
...+ Play = by) + Ploy #41) < 2M-1. (4.3) 


This is a sum of 2M probabilities, which can’t all be 1 for local behaviors, because 
if aj = b1 = a2 =... = bm, then by = a1. Since the maximum can be reached with LD 
behaviors, for which probabilities are either 0 or 1, the bound of 2M — 1 is indeed the 
highest we can expect. All the LD behaviors that saturate the bound are easy to list: There 


are the two behaviors aj =... = bm = +1, that set P(by Æa) =0; the 2M behaviors 
a=... =ap# bk =... =bm = +1, that set P(ag= be) =0, for k € {1, ..., Mẹ; and the 
2(M — 1) behaviors aj =... = bk Æ apy, =...=by=+1, that set P(bg = ags1) = 0, for 


ke ({l,..., M — 1}. This counting also shows that (4.3) does not define a facet of the 
local polytope for M > 2: The inequality is saturated by only 4M LD behaviors, whereas 
there must be at least Dys = M? + 2M extremal points on each facet. 

The most interesting feature of this family of inequalities is the fact that the quantum 
violation comes arbitrarily close to the algebraic limit 2M in the limit M — oo. Indeed, 
consider a two-qubit maximally entangled state 


as 
v2 


Jor) = (raja) +-al-2) 4) 


and measurements defined in the x — 2 plane of the Bloch sphere as 
â; = COS 69; 12 + sin 62;_ 1%, b; = cos 02; + sin O2;X (4.5) 


where 6; = kr With this choice, all the probabilities in (4.3) become equal to 5(1 + 


La a 2 r 
COS x44) = COS“ Fy and consequently 


12 


IU 
C =M(1 =) Z 2M- 
Mt tere A 8M 


a (4.6) 


This violation is the maximal achievable with quantum resources (Wehner, 2006). 

Now, 2M is very close to 2M — 1 in the limit of large M, so these inequalities 
become increasingly sensitive to noise (Exercise 4.2). Thus, the chained inequalities 
are not a tool of choice for the practical certification of nonlocality. However, from the 
convergence of the quantum bound to the maximum, theorists have been able to derive 
foundational observations on quantum theory (section 9.4) and its potential applications 
(section 11.5). 
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Before moving on, let us mention two other forms of the same inequality that are often 
found. First, one can invert all the terms of the equation using P(a; = bg) + P(a; 4 by) = 1 
and get 


uM = 2M — Cm 
= P(aı £b1)+P(b) Fa2)+...+ Play 4 bm) + Plbu = a1) = 1, (4.7) 


the maximal violation being now the minimum algebraic value Chy = 0 for all M. This 
expression has the pleasant feature that neither the local nor the algebraic bound depend 
on M. Second, since the chained inequalities are correlation inequalities, one can rewrite 
them using only correlators: Indeed, using Ejz=2P(a;= bg) — 1=1 — 2P(a; £ bp), 
one has 


Ci, = 2(Cu —M) 
= Fy; + E21 + E22 + E32 + ... + Emm — Fim < 2(M — 1). (4.8) 


From this form, it is manifest that the chained inequality for M = 2 is CHSH. 


4.3 Two Inputs, Several Outputs 


We move now to Bell scenarios (2, m; 2, m) and study the family of CGLMP inequalities 
(Collins et al., 2002a). There is one such inequality per value of m, often denoted 22mm 
or Im, as we shall do in this section. They are all facets of the corresponding local polytope 
(Masanes, 20038). For m = 3, I} is the only inequality added to the liftings of CHSH. For 
m > 3, the local polytope has not been fully solved, but it is known that there are other 
inequalities than those of the CGLMP family. Here we give first a detailed presentation 
of I3, then an overview of the common features of all I. 


4.3.1 CGLMP for m=3 


The CGLMP inequality for m= 3, which had been described independently in (Kas- 
zlikowski et al., 2002), can be constructed in an instructive way. Start from the sum 
P(ao = bo) + Plago = b1) + P(a, = bo) + Plai = 6b; @ 1), where @ denotes summation 
modulo 3. This is the same kind of contradiction as CHSH: If ap = bọ, ag = b1 and a; = bo, 
then LVs enforce a; = b1; but the fourth probability is for aj = b1 © 1, so the four terms 
of the sum cannot be all 1 under LV (it is easy to check on all the local deterministic 
behaviors that the maximum is actually 3). Clearly, any triple of conditions leads to 
a similar enforcement: For instance, a9 = bọ, a9 = bı and ay =b; ® 1 would enforce 
a, =bo ® 1 under LVs. Now, one can penalize LV even more by subtracting from the 
previous sum the probabilities of the conditions enforced by LVs on each of the four 
triples: 
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Ig = P(a = bo) +P(ao = 61) +P(a, = bo) +P(a; = 6; @1) 
— P(ao = bo ® 2) —P (ao = b1 1) —P(aı = bo 1) —P(ay = b1) < 2. 
(4.9) 


This is the version of [3 presented in (Collins et al., 2002a). Notice that, since we work in 
modulo 3, we could have replaced the condition a; Æ b1 by aj = b1 @ 2 instead of a; = b1 
® 1. But this is merely one of the many relabelings we are by now familiar with,’ leading 
to an equivalent version of the inequality. 

Having obtained the inequality, both (Collins et al., 2002a) and (Kaszlikowski et al., 
2002) made the natural assumption that the marae violation would be obtained with 


the maximally entangled state of two qutrits, |®) = 3 (100) + [11)+ |22)), and went on 


to find the value and the corresponding measurements. Shortly later, (Acin et al., 2002) 
wrote down the Bell operator for those measurements and discovered that its highest 
eigenvalue is larger than the value obtained previously, the associated eigenvector being 
the non-maximally entangled state 


1 
|D(y)) = ——— (10) |0) + 11) |1) + y 12) 12)) with y = (V11 — V3)/2. (4.10) 


J2t+y? 


To add to the surprise, changing the measurements did not bring any improvement: 
The optimal measurements are the same for both |®) and |®(y)), but a larger violation 
is obtained for the latter. As the argument stood, it relied on using qutrits: It was later 
confirmed that the maximal violation of this inequality uniquely identifies |®@(y)) in the 
sense of self-testing (Bancal et al., 2015). Thus, the maximal violation of CGLMP for 
m = 3 is achieved only for a non-maximally entangled state. 


4.3.2 CGLMP for any m 


Generalizing (4.9), the form of Im found in (Collins et al., 2002a) contains only 
probabilities of the form P(a=b @ 4|x, y), representing sum modulo m. Thus, each 
inequality of the CGLMP family is manifestly a correlation inequality. For m > 3, 
however, the coefficients of those probabilities become a bit cumbersome. This is why 
various authors have opted for different forms, that are more compact, even if the 
correlation character is no longer transparent. Frequent practitioners will probably need 
a translation manual to navigate the literature; for the scope of this book, I shall simply 
mention two forms. 

The most compact form is the one of (Zohren and Gill, 2008): Assuming the labeling 
a, b € {0,...,m — 1}, it reads 


IZS = P(a < b|0, 0) + P(a > b|0, 1) + P(a > d|1, 0) + P(a<d]1,1)<3 (4.11) 


3 It is not the relabeling b] — b; @1, because this would affect the condition ag = b; too. Rather, it is the 
relabeling that consists in exchanging the roles of Alice and Bob: Indeed, if aj = b1 È 1, then bj =a; © 2. 


Hardy’s Test and the Magic Square 61 


where P(a < b|0,0) = Ha Ža- P (a,b 0,0) and similarly for the other terms. The 
Collins-Gisin form for the same version of the inequality is 
—-1-1.---1}/0 0 -> 0 
—1| 1 1 -> 1] 1 0- 0 
—1| O 1 >- 1fj1 1- 0 
IRO ST P <0 with m= -10 0- As 1-1 (4.12) 
Oo} 1 0 >- O}-1 0 --- 0 
oy 1 1 0 j-1-1.--- 0 
oO} 1 1 --. 1 J-1-1.-- -1 


where each of the four square blocs is of dimension (m — 1) x (m — 1). In this form, it is 
manifest that I2 is CHSH expressed in the CH form (2.32). The two forms are simply 
related by IZS = ICC +3 (Exercise 4.3). 

Turning to quantum theory, there is no known analytical expression for the values 
of the maximal violation of Im as a function of m. For small values of m, Zohren and 
Gill (2008) obtained lower bounds by numerical optimization assuming states of two 
m-dimensional systems and projective measurements, that were later found to coincide, 
within numerical precision, with upper bounds obtained with the NPA hierarchy. The 
reported maximal violations increases with m, although they are bounded by 4 as is 
obvious from (4.11). The states that achieve the maximal violation are never maximally 
entangled for any m > 2. Specifically, denoting {|k} |k = 0...m-— 1} the Schmidt bases 
of the optimal state for both Alice and Bob, the optimal measurements are found to be 
projections in the bases 


m—1 eii ak m—l1 gan bk 
= miko _ m ikp 
lajs = >. a e h, b= D aq el (4.13) 
k=0 k=0 
for a,b € {0,1,..., m — 1}, with a9 = 0, ay = Z, bo = -55 and fp, = x. 


4.4 Hardy's Test and the Magic Square 


We finish this chapter by a detailed study of the two Bell tests based on extreme 
correlations that we introduced in section 1.4. 


4.4.1 Hardy’s test 


Hardy’s test was described in subsection 1.4.3. It is a test in the (2, 2; 2, 2) Bell scenario, 
so we know that the only tight inequality is CHSH. Like Mermin’s outreach criterion, it 
is a test valid under some constraints on the behavior. 
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We work with pure two-qubit states. The calculation is cumbersome if one starts 
from the Schmidt decomposition (3.15), see Appendix G.2. A much more convenient 
form is 


Iv) =a |+4)|-8) +er|-2)]-+2) +6 |-<)|-2) 4.14) 


with a, 8 € R* and 2a? + B*=1 (Exercise 4.4). In this notation, the first constraint 
(1.11) is immediately satisfied for Ag = Bo = oz. The second constraint (1.12) imposes 
that P(b; =+1|a9 = —1)=0. We know that |aọ = —1) = |-2) of Alice, conditioned 
on which Bob’s non-normalized state is a |+2) + B |-2) Therefore, this must be the 
eigenstate of Bı for eigenvalue bı =—1. A symmetric argument can be made for the 
third constraint (1.13). Thus we have found 


jay = +1) = |b; = +1) CSE aN |+3) 


) (4.15) 


a 
"ERa 
/ a2 + B2 

whence follows 
P(+1, +111, 1) = a4(1 — 2a?) / (1 —a@”)?. (4.16) 


The LV prediction (1.14) is that this probability should be 0. This is the case only 
for æ =0 (product state) or 8 =0 (maximally entangled state), while every pure non- 
maximally entangled state of two qubits shows nonlocality in Hardy’s test. The eas- 
iest example to remember is the case of a = B = 5 for which A; = Bı =—o, and 


P(+,+|1,1) = b (Exercise 4.4). The maximal achievable value is 


5/5- 11 
2 


max P(+, +|1, 1) = ~ 0.0902 (4.17) 
a 


obtained for a = (/5— 1)/2, that is for 4ı = By = (2 — V5)oz = 9 i= 20%. This 
value is the quantum maximum irrespective of the dimension and can only be reached 
with this two-qubit state in the sense of self-testing (Rabelo et al., 2012). Our presentation 
of the Hardy’s test violates CHSH in the form — Eoo + Fo; + Eio + E11 < 2, up toa 
value ~% 2.3607 for the behavior that achieves (4.17). 

It is instructive to understand why the maximally entangled state is not detected by 
Hardy’s test. Denote by | vBIa0) the state on Bob’s side conditioned on Alice having 
observed ag. For maximally entangled states, | vBjag=+1) and | vB\ap=-1) are orthogonal 
states (in particular, they belong to the same basis). Thus, the first two constraints 
P(ao= +1, b95 =4+1)=0 and P(ag=—-1, b; =+1)=0 imply that Bı =—Bo, and we 
have seen in subsection 3.1.3 that nonlocality cannot be demonstrated when one party’s 
measurements commute. Conversely, lve age 1) and |vB\ag=-1) are generically not 
orthogonal for non-maximally entangled state. The iteration of the construction is known 
as Hardy’s ladder. 
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4.4.2 The Magic Square 


The Magic Square test was described in subsection 1.4.4. It is defined in a (3, 8; 3, 8) 
Bell scenario. To describe the quantum measurement that achieves the perfect statistics, 
let’s consider the following square of operators: 


I@oz o3 @1 03603 
or, @I I@oz Oz © Ox |. (4.18) 


—Oz@ 03 | —03 Q og 05 @ OF 


The three operators on each line and on each column commute. Besides, denoting by 
Oj; the operator in the 7-th line and j-th column, it holds 


3 3 
[ [Os =1e1 for ali, []O;=-1@1 for all). (4.19) 
j=l i=1 


Thus, if Alice’s input x € {1, 2, 3} tells her to measure the three operators listed in line x 
of the square, and Bob’s input y € {1, 2, 3} tells him to measure the three operators listed 
in column y of the square, the first two of the three conditions (1.15) are automatically 
satisfied for any state. In order to satisfy the third condition, we need to find a four-qubit 
state such that Alice’s and Bob’s outputs are the same for the operator where the line 
and the column intersect, i.e., (Y [Oxy Q Oxyl Y) = 1. The state that does the job is the 
product of two maximally entangled two-qubit states: For the operators as written, it is 
|O*) 4B, [Oe yee with |+) given in (4.4). Once again, it was later proved to be unique 


in the sense of self-testing (Wu et al., 2016). 
This Bell test has been formulated requiring perfect correlations. One could try and 
relax those requirements by formulating the inequality 


\oP{] [eat [ [4 = 1, a =o] <8. (4.20) 
xy j k 


This inequality is not a facet of the local polytope in the (3, 8; 3, 8) Bell scenario (Gisin 
et al., 2007). However, the verifier could ask Alice to enforce I]; d. = +1 and Bob to 
enforce [| [; bk = —1, by outputting their third bit according to the rule. Now Alice and 
Bob output only two bits, thus this modification defines a Bell test in the (3, 4; 3, 4) Bell 
scenario. For it, the inequality 


SP] d =] [d= +, ] [= -1 ] <8 (4.21) 
xy j k 


does define a facet of the local polytope. 
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EXERCISES 


Exercise 4.1 We consider the behavior (4.2) : 


(a) Prove that it violates some liftings of CHSH. Recall that the Collins-Gisin representation 
of CHSH 1s (2.32). 

(b) Prove that it does not violate any of the 576 versions of I3322. This can be done by brute 
force on a computer; we rather suggest to do it by hand by keeping the inequality fixed 
and applying the symmetries on this (very symmetric) behavior. 


Exercise 4.2 Prove that, in order to violate the chained inequality (4.3) with a two-qubit 
Werner state (3.23), one needs W = 1— i: 


Exercise 4.3 Prove that the two forms (4.11) and (4.12) of the CGLMP inequalities 
are related by 1G = ICC +3 for no-signaling behaviors. Does this equivalence also hold for 
signaling behaviors? (If yes, prove it; if not, you should be able to find a behavior that violates 
one but not the other.) 


Exercise 4.4 This exercise proves two statements made in discussing Hardy’s test (subsection 
4.4.1): 


1. Prove that any two-qubit pure state can be written as (4.14). Due to the uniqueness of 
the Schmidt decomposition (3.15), it 1s enough to prove that the largest eigenvalue of 


the reduced states can take any value between 5 and 1. 
2. Prove that the state 5 (|+2) |-2) + |-2) |+2)+ |—2) |—2)) leads to P(+,+|1, 1) = 4 
for Ay = B; =—0 y. 


5 
Multipartite Bell Nonlocality 


Non serve essere quindici in squadra se tutti in propria area. 
No point of being fifteen in a team, if all stay in their penalty area. 
Attributed to V. Boskov 


We conclude Part I with an overview of Bell nonlocality in multipartite scenarios. 


5.1 Definition and Systematic Results 


5.1.1 Multipartite local behaviors 


The definition of a multipartite local behavior is the natural extension of the bipartite 
case (2.16). Denoting by x and a the input and output of the i-th player, with 
1=1,...,,a behavior is local if its probability distributions can be written 


Pia, ... aP |x, 2.x) © f anova)  Py(a™ |x) (5.1) 


for all inputs. 

It is then straightforward to extend Fine’s theorem: In particular, any local behavior 
can be seen as convex combination of local deterministic behaviors, the latter being the 
extremal points of the local polytope. 

In this sense, there is nothing fundamentally new in the definition of locality for 
multipartite Bell tests (although nonlocality is richer, since there are various ways in 
which a behavior may fail to be local, see section 5.3). That being said, the complexity of 
the local polytope grows very quickly, to the point that few systematic studies are known: 
We review them in subsections 5.1.3 and 5.1.4. Before this, let us turn to quantum theory 
and prove a result that was left pending in subsection 3.2.4: All pure entangled states are 
nonlocal resources, even multipartite ones. 


5.1.2 All pure entangled states are nonlocal resources 


We want to prove that any pure state |W) 4a), 4» with n > 2 is a nonlocal resource if 
it’s entangled. We follow the proof of (Popescu and Rohrlich, 1992a). Without loss of 
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generality, the state is taken to be entangled in all its subsystems: If the state were of the 
form |) 40, 4m ® lo'n, 40> we know already that correlations across the tensor 
product cannot exhibit nonlocality, so we would consider the two groups separately. 
We first consider a Bell test for the first and the second players A“? = A and 
A®) = B. Each of the other players is queried with x) = 0 (j =3,..., n), performs the 
corresponding measurement M*”’=9 and sends the output to the verifier. Only when 
the verifier receives a particular string of outputs, say (a, ..., a) = (+1, .... +1), 
he queries Alice and Bob with a normal CHSH test with x, y € {1, 2}. The multipartite 
state being entangled in all its parties, the measurement operators M*?=0 can always 
be chosen such that the conditional pure state |) 4p « Hee ise ey" IW) 4a, a Is 


entangled.! Then, Alice and Bob can choose the measurements associated to the inputs 
x, y € {1, 2} such that the resulting bipartite behavior 


[Pa bix, yx =0, a = +1 forj=3,...,)|x,y€ {1, 2},a, b € {-1, +1)| (5.2) 
is nonlocal. But then, the multpartite behavior 
Pain {P(a,b,+1,...5,+1]x5y;0,...,0) |x, ye {1,2},a,6 e {-1,+1}} 


is nonlocal too: If it were of the form (5.1), the bipartite behavior (5.2) would be local. 

This means that the Bell test just described for players 1 and 2 can be a subset of 
rounds of a bigger Bell test that assesses the multipartite behavior, and any multipartite 
behavior P will be nonlocal as soon as it contains Pux as a sub-behavior. This concludes 
the proof that any pure entangled state is a nonlocal resource—a disappointingly simple 
and certainly non-optimal proof, insofar as it relies only on pairwise nonlocality, but 
sufficient for the purpose. 


5.1.3 The simplest multipartite Bell scenario 


The simplest multipartite Bell scenario is the (2, 2; 2, 2; 2, 2) scenario, that involves 
three players, each with binary input and output. We can anticipate that the local 
polytope will have positivity facets and liftings of CHSH in some multiplicity. But 
when the full characterization was reported (Pitowsky and Svozil, 2001; Sliwa, 2003), it 
showed an unexpected explosion in complexity. The local polytope has 53856 facets— 
for comparison, recall that the two-player scenario has 24. After sorting them out for 
relabeling, there remain 46 inequivalent types of facets: The positivity facets and the 
liftings of CHSH are indeed present, and so is the Mermin inequality (1.10) that we’ll 
meet again in the next section. For a rather systematic study of all the others, the reader 
can refer to (Lopez-Rosa et al., 2016). 


1 The formal proof is cumbersome to write down: In fact, the formalization given in Popescu and Rohrlich 
(1992a) was incorrect and was fixed in Gachechiladze and Guehne (2017). 
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Here, we single out the only inequality that cannot be violated with quantum behaviors 
(Almeida et al., 2010): 


G = P(0, 0, 010, 0, 0) + PCL, 1, 0/0, 1, 1) + PCO, 1, 1]1, 0, 1) + PC, 0, 111, 1,0) < 1. 
(5.3) 


This inequality has been called “guess-your-neighbor-input” (GYNI) inequality because 
the four probabilities are of the form P(a = y, b = z, c = x|x, y, z). The local bound is 
easy to check on LD behaviors: P(0, 0, 0/0, 0, 0) = 1 means ap = bọ = co = 0, and then 
Pd, 1, 0|0, 1, 1) = 0 because it implies ag = 1, and so on. It is also easy to prove that 
quantum theory cannot violate the inequality. Indeed, in quantum theory, there exist 
states and measurements such that P(a, b,c|x,, z) = Tr(p 121,12) = Tr(p i): Now, 
notice that the four projectors a that enter (5.3) are mutually orthogonal: Tigo is 
associated to output a = 0 for measurement x = 0, while Tit ` is associated to output a = 


1 for the same measurement x = 0, and so on. Thus, G = goa + Tita + T1491 + TILS <I 


and therefore G = Tr(pG) < 1 as claimed. 

Outputting another player’s input deterministically is an extreme form of signaling, 
hence surely one cannot reach the algebraic limit G = 4 with no-signaling behaviors. It 
was proved that no-signaling behaviors can violate up to G = ;. The GYNI inequality 
is thus irrelevant for the purpose of certifying the nonlocality of quantum resources. 
However, it has inspired some approaches to single out quantum theory among no- 
signaling theories (section 10.4). 

The complexity of the local polytope? in this simplest multipartite Bell scenario has 
probably discouraged similar studies for larger Bell scenarios. 


5.1.4 All correlation inequalities for two inputs, two outputs, 
n players 


The other known systematic result in multipartite nonlocality is the classification of all 
correlation inequalities for every (2, 2; 2, 2; ...; 2, 2) scenario with arbitrary number n 
of players. These inequalities are referred to as WWZB inequalities (Werner and Wolf, 
2001a; zukowski and Brukner, 2002). To present the result, let us use the notation 
ay) € {-1,+1} for all 7 = 1, ..., n. In this notation, the correlator Ex;...x„ is simply the 
average value of the product of the outputs: 


n 
Ess, = Ex = (Iep): (5.4) 
j=1 


WWZB proved that every tight correlation inequality is of the form 


2 The later characterization of the corresponding no-signaling polytope, a notion that we shall introduce in 
chapter 9, made the picture even more appalling (Pironio, Bancal, and Scarani, 2011). 
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Y so I [s’ Ex < 2N (5.5) 


se{—1,+1}” xe{0,1}" \=1 


where S(s) can take only the values +1 and —1. Thus, this defines 22" inequalities. Clearly, 
there is some redundancy in the list. Once sorted out by symmetries, the following 
number of inequivalent inequalities is found: 2 for n = 2 (we knew that: Positivity and 
CHSH), 5 for n = 3 (positivity, CHSH, Mermin and two more; that is, the 41 others 
are not correlation inequalities), 39 for n = 4, then ... more than 17476 for n = 5. Upon 
presenting this, WW tersely conclude that “for n > 5 [...] listing all essentially different 
inequalities is not going to be useful.” The conditions (5.5) can be summarized in one 
non-linear inequality for the correlators,’ 


oN (5.6) 


DADEG 


se{—1,+1}” |xe{0,1}" \j=1 


which has been useful in deriving some analytical results. 


5.2 Examples of Multipartite Bell Inequalities 


We are going to give some examples of multipartite Bell inequalities. We focus mostly 
on a sub-family of WW2ZB, featuring one inequality for every number n of players, 
usually called MABK from the names of Mermin (1990a), who first reported such a 
construction, and Ardehali (1992) and Belinsky and Klyshko (1993a) who improved it.4 


5.2.1 Construction of the MABK family 


The MABK inequalities M, = (M,,) < 1 are defined by a recursive construction of the 
correlators. The starting point of the recurrence is Mz < 1 with 


1 
Ma = + (aP a? + ofa? + aa — aa), 6.7) 


which is just CHSH renormalized for convenience. Then, given Mpn—1, we have 


1 
Mn = 5 (Mra? + af?) +M, (af? — af?) <1 (5.8) 


3 I have written this nonlinear inequality in the notation of ZB. In WW, it is equation (12), and the 
commentary that follows it is arguably the cheekiest sentence in the field (my italics): “Obviously, this nonlinear 
inequality is nothing but the characterization of the hyperoctahedron in 2n dimensions as the unit sphere of 
the Banach space £+. From this simple characterization of [the set of local correlations] it might seem that our 
problem is essentially trivial. However, ...”. 

4 The review of Belinsky and Klyshko (19936) provides an interesting window on the early days of 
nonlocality. When it comes to this topic (section 5.1 of the paper), a paper by Roy and Singh (1991) is 
mentioned, which somehow did not make it among the names retained in the acronym. 
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where M’_, is the equivalent version of the inequality obtained from M,_; by 

permuting the labels of all players’ inputs. The local bound of (5.8) can be proved 

recursively. Assume it holds for n — 1 players, that is, M,_; and M/_, can only be +1 

or —1 for LD behavior. Then, a LD behavior on the n-th player has the familiar feature: 

If a® = = a” = +1, we have (a” + ai) = +2 and (a® — a”) = = 0;if a% = a = +1, 

we have (a? + a”) = = 0 and a= ay a”) = = +2. 
Explicitly, the first iteration gives 


1 
Ms = 5 (Malay? +a) + Mpa — a”) 


1 (1) (2) gq? a) (2) (3) (1) (2) (3) (1) (2) (3) 
= 5 (af? apa tag aj ag taj ag ag = ai aya?) 


that is, 
1 
M3 = z Foo1 + Eo10 + E100 — E111) < 1 (5.9) 


and is the Mermin megualy (1.10) that we have encountered in the introduction, 
rescaled by a factor 4 . Notice how only 4 out of 8 correlators appear in the expression. 
The next iteration ae 


1 
M4 = i [—Eooo00 + (E0001 + E0010 + E0100 + E1000) 


+ (E0011 + E0101 + E1001 + E0110 + E1010 + E1100) 
— (20111 + £1011 + £1101 + E1110) — E1111] < 1. (5.10) 


This time, all 16 correlators appear in the expression. The inequality for 5 players Ms 
has again 16 correlators instead of the possible 32 (Exercise 5.1). The general form is 
Mp = 27!"4/2 [sum of 2!" correlators], each correlator taken with the suitable sign; so the 
algebraic maximum is 2!!/2, Also, the versions thus constructed are invariant under any 
permutation of the n players (although obviously there exist versions that are equivalent 
under relabelings and do not have this property). 


5.2.2 Maximal violation in quantum theory 


In quantum theory, the Mn should be seen as Bell operators 
1 
M, => (My-1 8 (4P +47”) +M, 1 @ AP —A)”)) (5.11) 


where every Ag? is an operator with eigenvalues +1 and —1. 
Let us first derive an upper bound on the quantum violation, by a recursive 
proof. Suppose we have the upper bound for n — 1 players: (Mn-1) (M) < Qn-1. 
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Set My-1 = Qn1A and M’_,=Q,-1A’, so that M,=2C with C=A®@ 
(Ae? +4) +A'Q (A? Ay: Now, —I < A,A’ < I. Thus, C is a CHSH operator: 
It can be written as (3.6) and the Tsirelson bound ||C||., < 2/2 follows. Putting 


everything together, we have got Q, < /2Q,_13; since Q2 < /2, we conclude 


Mp < 2°-Y/? for quantum behaviors. (5.12) 


For odd n, this bound coincides with the algebraic bound 21/2] while for even n it is 
smaller than it by a factor T 


The bound (5.12) is attained, for any n, by suitable projective measurements on the 
n-qubit GHZ state 


1 a\@n a\@n 
|GHZ,,) = z (+3 +|-4}*") (5.13) 


(and only by those, in the sense of self-testing, chapter 7). The proof for arbitrary n is 
too lengthy for our purposes (Belinsky and Klyshko, 1993a; Scarani and Gisin, 2001). 
Let us work out explicitly the case n = 3, thus completing the claims left pending in 
subsection 1.4.2. It holds 


aoz Q oz Q og |GHZ3) = |GHZ3) and oz D 05 Q 05|GHZ3) = — |GHZ3) (5.14) 


plus permutations of the latter; in fact, these four eigenvalue equations define |GHZ3) 
uniquely.’ Thus, by choosing âo = bp = ĉo = 9 and â; = bı = Cy = —X we obtain Eo01 = 
Eo10 = E400 = —E111 = 1, that is M3 = 2. 


5.3 Various Scenarios of Multipartite Nonlocality 


5.3.1 Richness of nonlocal scenarios 


Let us begin with a parable (Figure 5.1, left). Alice, Bob, and Charlie are going to take 
part in a three-partite Bell test. Upon inspecting the venue, they realize that Bob’s and 
Charlie’s rooms, though reached through different hallways, are actually adjacent and 
separated by a very thin wall: They will be able to communicate during the game! Alice, 
however, will be properly isolated. Can they now cheat the verifier in believing that they 
have a three-partite nonlocal resource? 

Let’s suppose the verifier checks the Mermin inequality. By communicating, Bob 
and Charlie can produce the correlations Eoo = Eo, = Eio = —E 1, = £1; they can 
have agreed in advance in which rounds to use which option, and instructed Alice 


5 The intuition of Mermin (1990a) in building the inequality (1.10) was precisely the set of eigenvalue 
equations 5.14, related to the notion of stabilizer group that we won’t introduce here. This intuition carries 
over to the MABK inequalities for odd n, but not for even n. The same intuition was later used to construct 
inequalities tailored to other graph states (Scarani et al., 2005; Giihne et al., 2005). 
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Figure 5.1 The Svetlichny scenarios of multipartite nonlocality: Various groups correlated with LVs, 
with free communication inside each group. Left: The original scenario. Right: A possible multipartite 
scenario. 


to output a9 = a; = +1 accordingly. The verifier will then observe M3 = 2, as well 
as all the marginals expected for the suitable measurements on the |GHZ3) state. 
Thus, the Mermin inequality can be maximally violated if only two players share some 
nonlocal resource (in this case, communication), the third being classically correlated. 
One says that the Mermin inequality does not test genuine multipartite nonlocality. This 
observation was raised very early on by George Svetlichny (1987). 

The parable illustrates a remark that we made at the start of the chapter: While the 
definition of locality is the obvious extension of the bipartite one, there are many ways 
in which a behavior may be nonlocal. The next subsection is devoted to Svetlichny 
scenarios, the following subsection to a generalization. For a synthetic classification of 
multipartite nonlocality scenarios, the reader can refer to (Chaves et al., 2017). 


5.3.2 Svetlichny scenarios 


Svetlichny scenarios of multipartite nonlocality consist in grouping the players in several 
groups, such that communication is free within each group. Then one derives inequal- 
ities under the assumption that the different groups are correlated only through LVs 
(Figure 5.1). Svetlichny (1987) constructed the inequality for three parties,° the gen- 
eralization to n parties followed several years later (Collins et al., 2002b; Seevinck and 
Svetlichny, 2002). 

In a compact presentation inspired by that of (Bancal et al., 2011a), the Svetlichny- 
type expression for 7 players is built recursively as 


Sn = S56" 48" a” (5.15) 


n—-1 


from Sı =a\) + a 


ai) > aP on the measurements and outcomes of only one player (here the first, but 


. The notation S” | indicates the relabeling a —>—a; and 


6 For the record, the paper has a typo in the expression of the inequality. 
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by symmetry it could be any). The first iteration gives Sz = awa + aP aP — aP aP t 
ata that is CHSH; and SJ = — ava + a a) — aPaP — aP aP is another ver- 
sion of CHSH. 

The first interesting case is the three-partite case studied by Svetlichny. One way 
to read the expression S3 = Sza +8) a is the following: When the verifier queries 
Charlie with z = 0 (respectively, z = 1), he will test Alice and Bob with CHSH in the 
form S2 (respectively, S4). If Alice and Bob share LVs, then both S2 and S% will be 
bounded by 2, and therefore 


S3 = (S3) < 4, (5.16) 


with S3 = Eooo + E100 + Foto — £110 — F101 + Eoo1 — Eo11 — E111. This holds true 
even if Bob and Charlie can communicate freely, as in the previous parable. By symmetry, 
the reasoning is independent of which pair of players can communicate. Therefore, the 
Svetlichny inequality (5.16) holds for every behavior of the form 


Rabu goe J PRAAT AN, 
+ J PR AE E A 
+ / dvQg(v)P,(b, cly,2)P,(alx), (5.17) 


where f daQ,(A) + f duQi(u)+ f dvQa(v) = 1. These constraints define a polytope, 
called the Svetlichny polytope; the inequality (5.16) is one of its facets. In the (2, 2; 
2, 2; 2, 2) scenario, none of the non-trivial facets of the local polytope is also a facet of 
this polytope (Pironio et al., 2011). 

To obtain the maximal quantum value, we can work in a similar way as we did for the 
MABK family: The Bell operator corresponding to S3 is 


$3=82@A +8", @ AP? =A, @ (AY +40) +A @ AY — AY) 


with Ay = 5(S2 + S”2). Since —2I < Ax < 2I, we can achieve (S3) = 4/2 in violation 
of (5.16). This maximal violation is once again obtained for the |GHZ3) state. 

The case of n > 3 players follows the same pattern. For the Svetlichny bound, we can 
work by recursion: Suppose that 


Sp = (Sn) 27"! (5.18) 


holds for any bipartition of the n players (we have just proved that it holds for n = 3). 
Then, when the verifier queries the (n + 1)-th player with x”tD = 0 (respec- 
tively, x”+tD = 1), he will test the other n players with S, (respectively, S7); whence 
Sn+1 < 2”, confirming the recursion. The maximal quantum violation is attained by 
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suitable measurements on the |GHZ,,) state (5.13) and is found to be S, = 2-2, 
exceeding the two-groups bound (5.18) by a factor /2. 

Other symmetries of the recursive definition (5.15), as well the relationship between 
the Svetlichny and the MABK family, are proposed as Exercise 5.2. In summary, some 
quantum behaviors exhibit genuine -partite nonlocality in the sense of Svetlichny, for 
every n. 


5.3.3 Scenarios with directional signaling 


The Svetlichny definition of genuine multipartite nonlocality implies the possibility of 
signaling between some of the players. This has inspired two types of generalizations. 

The first generalization consists in using a refined classification of signaling behav- 
tors. In the definition (5.17) of the Svetlichny polytope, the two-player probabilities 
Py (a, b|x, y), Pa (a, clx, z) and P,,(6, cy, z) are unrestricted. But the verifier is free to 
query the players at different times. For instance, getting back to our parable, the verifier 
may query Bob first, and query Charlie only after receiving Bob’s answer. In this case, the 
only acceptable P, (b, cly, z) are those in which Bob can communicate to Charlie but not 
the reverse. There may be behaviors that belong to the general Svetlichny polytope, but 
that cannot be decomposed on such directional behaviors: Their genuine multipartite 
nonlocality could then be detected with the suitable choice of timing’ (Bancal et al., 
2013). Keeping directionality in view is also the best way to avoid formal mistakes when 
dealing with signaling behaviors (Gallego et al., 2012): We sketch the pitfalls in which 
one can fall in Appendix G.3. 


Figure 5.2 Approaching multipartite nonlocality taking into account the directionality of 
communication. Jones, Linden, and Massar proved that Svetlichny’s inequalities also hold as soon as 
there are two players in the network, such that no player knows both their inputs. Left: The simplest 
setting in which the players cannot be separated in non-communicating groups, and yet none of the three 
players gets to know both y and z. Right: A six-player scenario, in which nobody gets to know both the 
inputs of the two players denoted by a star. 


7 For the record, what ultimately converged in the paper (Bancal et al., 2013) was cited for several years as 
“Barrett and Pironio, in preparation”. 
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The second generalization consists in considering other scenarios with limited com- 
munication links (Figure 5.2). Instead of allowing free communication inside non- 
communicating groups, communication can be restricted in its directionality: For 
instance, Alice can signal to both Bob and Charlie, while they cannot reply nor 
communicate between the themselves. Such a scenario is not captured by (5.17), but 
it turns out that Svetlichny’s inequality still holds. In fact Jones, Linden, and Massar 
(2005) proved that in any n-player scenario, the bound (5.18) holds as soon as there is 
at least one pair of players (i,j), such that no player knows both x; and x; (for instance, in the 
example just put, no player knows both y = xg and z= xc). 


EXERCISES 


Exercise 5.1 Derive the MABK inequality for five players M5, using the iterative definition 
given in subsection 5.2.1. 


Exercise 5.2 In this exercise, we explore two consequences of the definition (5.15) of 
Svetlichny inequalities. 


1. Prove by recursion that 
1 
Sn=5 (Sn-rSk +S, Sz) (5.19) 


is valid for all k = 1, ..., n — 1. Hint: Recall that the permutation” affects only one 
player, and therefore (Sn—p Sp)" = S_p Sp. 

2. In (Collins et al., 2002b), the Svetlichny expression was defined in terms of the MABK 
expressions as follows: 


s=] Mn for n even (5.20) 


5(Mn+M!,) for nodd. 


The definition (5.15) leads to the same, up to some relabelings and scaling. Show it for 
n=3andn=4. 


Part Il 


Nonlocality as a Tool 
for Certification 


This second part presents Bell nonlocality as a tool for applied physics. The key 
observation is simple: On the one hand, nonlocality is assessed on the basis of the 
observed behavior alone; on the other hand, in the framework of quantum theory, 
nonlocality can only happen in the presence of entanglement, which itself is a resource 
for tasks like randomness generation and quantum key distribution. Therefore, the 
observation of nonlocality certifies entanglement (and possibly its usefulness) in a device- 
independent way, i.e., without characterization of the physical degrees of freedom. 
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The Set of Quantum Behaviors 


Go, go, go, said the bird: human kind/ 
Cannot bear very much reality. 
T.S. Eliot, Burnt Norton 


After an introduction to device-independent certification that motivates the whole Part 
II, this chapter deals with the definition of the set of quantum behaviors and with some 
useful outer approximations to it. 


6.1 Device-Independent Certification: A First Introduction 


In a Bell test, the players should be on top of what they are doing—to say it like a physicist: 
The experimentalist must have a good control of the degrees of freedom and of the 
corresponding measurements. But none of this is needed by the verifier. Bell nonlocality 
can be certified without any characterization of the devices. We say that it is a device- 
independent (DI) certification of nonlocality. 

Bell nonlocality has implications. We have seen that the only no-signaling deterministic 
strategies are the local deterministic ones (Exercise 2.2). Therefore, if the observed 
behavior is nonlocal and the resources are guaranteed to be no-signaling, the processes 
that produced the outputs are certainly random. One can then try and infer a bound 
on the amount of randomness from the certification of nonlocality. It would then be a 
DI certification of randomness: Something pretty disruptive, insofar as one is able to 
determine that a process is random without any modeling of the process itself. This is 
indeed possible, and we shall dedicate chapter 8 to it. Several other DI certifications 
can be based on nonlocality: Lower bounds on the secret key that can be created by 
quantum key distribution, on the amount of entanglement present in the system, on the 
dimensionality of the system... These are reviewed in Appendix F, alongside with the 
history of these discoveries and some confusions worth clarifying. 

The theory of DI certification is challenging: One wants to assess (say) how much 
randomness is present by assuming that quantum theory is correct, but without being 
able to specify even the dimension of the Hilbert space of the system under study. Even 
brute force optimization becomes impossible. This calls for the characterization of the 
set of quantum behaviors that was postponed in chapter 3. This chapter is devoted to it. 
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6.2 Definition and Geometry of the Quantum Set 


6.2.1 The definition 


Given a Bell scenario, the quantum set Q is defined as the set of all the quantum behaviors 
in that scenario, i.e., the behaviors that can be observed assuming that the players share 
quantum resources. 

As for local behaviors, Q is first defined under the i.1.d. assumption; later we’ll prove 
that the definition remains the same even if the underlying strategy is not 1.i.d. Let us 
start by repeating the definition of a quantum behavior given in section 3.1 using the 
notations introduced there: 


Definition 6.1 A behavior P belongs to the quantum set Q of the (M4, m4;Mg, mp) Bell 
scenario if there exist: 


e Two Hilbert spaces H4 and Hp, of unspecified dimension, that could even be infinite 

e A state p in the space of linear operators on HA Q HB 

e A family of ma-output measurements {M7,|x¢ X}, whose elements IT% are linear 
operators on HA 

e A family of mp-output measurements {Maly eV}, whose elements JA are linear 
operators on Hp, 


such that 
P(a, b|x, y) = To TI} ® I). (6.1) 
The extension to multipartite Bell scenarios 1s immediate. 


The fact that the dimensions of the players’ Hilbert spaces are left unspecified is 
important in several respects. First of all, it guarantees that the players’ strategies are 
not restricted. In particular, the state o does not need to describe only the shared 
entanglement, it can include an arbitrary amount of local information as well. 

Second, by Naimark’s theorem, we can consider only projective measurements without 
loss of generality. By the same token, one could enlarge the Hilbert space to include the 
purification of the state and assume that the state is pure. We shall do it occasionally, 
notably for self-testing (chapter 7). However, as we shall see in chapter 8, for some 
adversarial tasks the purification may not be in the players’ hands. 

The third consequence is the convexity of the set, see subsection 6.2.3. 


6.2.2 Another set, and the “Tsirelson problem” 


In the development of tools to study the quantum set, a few rigoros mathematical results 
could be proved only by altering the definition of the set under study: 


Definition 6.2 A behavior P belongs to the set Q' of the (M4, m4;Mp, mp) Bell scenario if 
there exist: 
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e A Hilbert space H, of unspecified dimension, possibly infinite 

e A state p in the space of linear operators on H 

e A family of m,-output measurements {M”,|x € X}, and a family of mp-output 
measurements {Mz ly € VY}, whose elements are linear operators on H and satisfy 


(112, 3] = 0 for all x, a, y, b (6.2) 
such that 
P(a, bjx, y) = Tr(o T} m). (6.3) 


In words, the requirement that Alice’s and Bob’s measurements are in tensor product is 
replaced by the requirement that all the operators in the family {M} |x € ¥} commute 
with all the operators in the family {MRD € V}. It is clear that Q C Q’, because the 
representation IT’ = TIX & I and ny =1® Tl; satisfies (6.2). The reverse implication 
would mean the following: When two complete sets of operators commute, it should be 
possible to ascribe different degrees of freedom to them, i.e., to find a tensor product 
structure. Tsirelson first claimed that this can be proved, but subsequently retracted 
the general claim, and this question became known as “Tsirelson’s problem.” For an 
introduction and early references, we refer to (Navascués et al., 2012); the state of 
the art is best summarized in the introduction to (Coladangelo and Stark, 2018). In 
a nutshell: Whenever the behavior P can be obtained with finite-dimensional Hilbert 
spaces, the definition through the commutator implies the possibility of constructing a 
tensor product representation; and it is also known that Q = Q' for the Bell scenario (2, 
2; 2, 2) and a few other cases. However, in general Q is not a closed set (Slofstra, 2016): 
Whether its closure is either equal to, or strictly contained into, Q’, is the remaining open 
question.! 

In the following, we work mostly with (6.3) for simplicity of notation, but a tensor 
product can be read there too unless stated otherwise. 


6.2.3 Geometry 


The quantum set Q is a convex set. The proof relies on the fact that the Hilbert space 
dimensions are unspecified: 


„k 
Pla, bxy) = X pe Trop" Tp") = Tr(o TT) 
k 


1 Though of mathematical importance, the fact that Q is not closed is irrelevant in practice, since every 
observed behavior will belong to the interior. However, if the closure Q were different from Q’, this difference 
could be the object of tests: Indeed, both are convex sets, so the separating hyperplane theorem implies that 
there should a constant gap between them. A difference may have further profound implications: For instance, 
in the field of algebraic quantum field theory, localization is defined through commutators and not through 
tensor products. After completion of this book, a proof was found that the two sets are not equivalent (Fi et al. 2020). 
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with? p = pror UX = @, 1" and M = D: re Conversely, if one fixes the Hilbert 
space dimension of the shared resources, the set of achievable quantum behaviors is 
generally not convex,’ although it may be in some cases (Donohue and Wolfe, 2015). 
From convexity, it also follows that, if the underlying strategy is not 1.1.d., the effective 
single-round behavior still belongs to Q as just defined. The proof follows the same 
pattern as the one for local behaviors given in subsection 2.6.1. Indeed, if 
P(a1, 42; b1,62 1X15X25¥15V2) = Tr(pl Mya)» 


a142 


the corresponding effective single-round behavior reads 
P'(a,b = T(E E?) + LTr F*F? 6.4 
(a, bix, y) = 5'Tr(0EzE,) + 5 Te(pFiF3) (6.4) 
with the two POVMs 


P(x, x2,Y,Y2) 

a E YY2 34323 

EzE; = 5 Mya 5 Mo P'(x y) 2 
42X2 b2 y2 ? 


P(X15%5159) 

XRY — Xx My ave EP, 

FF, na 5 Maja 2 Iye P'(x y) g 
41X1 biwi 3 


Here, P' (x,y) = Ep P(x, x2,Y,Y2) + Da P(x1x,¥15)]. In summary, P'(a,b|x,y) 
is a convex sum of behaviors that belong to Q, and thus belongs to Q as well. 

Contrary to the set of local behaviors, Q is not a polytope: I continuously has many 
extremal points, even in the simplest Bell scenario. In other words, some of its boundaries 
are not hyperplanes, but convex curves, the most famous example of which is presented 
in the next subsection. The first attempts at systematic studies are uncovering a complex4 
zoology of geometric features (Goh et al., 2018; Duarte et al., 2018; Rai et al., 2019). 
Overall, the only known way to prove that a behavior is on the boundary of Q consists 
in showing that it has a quantum realization while proving that it lies at the boundary of 
a superset (thus, on that point, the superset and the set coincide). Techniques to define 
supersets of Q are discussed in section 6.3, while finding a quantum realization basically 
requires a good guess. 


6.2.4 An example of boundary 


The most famous example of a curved boundary of Q is the boundary of the quantum 
set of the (2, 2; 2, 2) scenario in the slice with unbiased marginals (a9) = (a1) = (bo) = 


2 One possible realization of these direct sums is simple: In each round, the players draw a value of k with 
probability pz and act accordingly. 

3 Fixing the dimension may have consequences for the local set too. For instance, if the fixed local dimension 
of Alice’s (Bob’s) system is smaller than the number of outputs m4 (mg), one cannot even produce all the local 
deterministic points of the Bell scenario. 

4 Atleast, as complex as allowed by the fact that the set is convex. 
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(b1) = 0. Usually it is referred to as TLM boundary? (Tsirel’son, 1987; Landau, 1988; 
Masanes, 2003a). Above the CHSH facet (2.29), the equation of this boundary is 


Arcsin(£o9) + Arcsin(Eo1) + Arcsin(£19) — Arcsin (E11) = 73 (6.5) 


for the boundary above any of the other seven facets, one has to apply the corresponding 
permutations. 

The TLM boundary defines a three-dimensional surface embedded in R4 (recall that 
Dys = 8 for this Bell scenario). A possible parametrization is® 


Exy = cos(ay — By) with a1 < Bo < a0 < Bi, |ax — Byl <7. (6.6) 


These correlations can be achieved by measuring the maximally entangled state of two 
qubits |t) with measurements defined by the directions âx = cos œx + sina,x and by = 
cos By + sin Bx. This proves that the TLM curve belongs to Q. The proof that it is also 
an upper bound will be sketched in subsection 6.3.3; alternatively, it can be inferred from 
the fact that the quantum realization just described is unique in the sense of self-testing, 
see chapter 7. 


6.3 Semidefinite Relaxations of the Quantum Set 


The quantum set is hard to characterize exactly, so it is often convenient to work with 
approximations. One obvious approximation consists in restricting the possible states 
and/or measurements, but the resulting set of behaviors is a subset of Q. In view of 
device-independent certification, we’d rather have supersets (or relaxations) of Q so that, 
if something is proved for one of these supersets, it is automatically proved for the whole 
of the quantum set. 

This section presents a construction of such supersets, whose membership is defined 
by compact and algorithmically efficient conditions. It was introduced in a systematic way 
by Navascués, Pironio, and Acín (2007; 2008) and thence referred to as NPA relaxations. 


6.3.1 The construction 


The NPA relaxations are based on the following observation: 


Lemma 6.1 Let F = {F;,..., Fn} be a collection of linear operators. Then, for any state p, 
the hermitian matrix T (p, F) whose entries are 


5 Lluis Masanes derived this bound during his Ph.D. thesis, unaware that it was already known. When told 
after posting his draft on the arXiv, he did not submit it for publication. Nonetheless, his derivation being 
different than the two others, the community added his initial to the acronym. 


6 Indeed, if 0 € [0, 7], using cos6 = sin(5 — 0) we have Arcsin(cos@) = 7 — 0. With the order of angles 
chosen in (6.6) and exploiting cos@ = cos(—0), the Lh.s. of (6.5) reads (3 ag + Bo) } (3 Bi 4 a) } 
(5 — Bo +a1)-— (5 — £21 + a1) which is indeed equal to z. 
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Ty = Tr(pF'F;) (6.7) 
is positive semidefinite. 


Proof The proof is very direct: For any vector v € C” it holds 


aTs=Tr] p (x rt) > oF; >0 
F 


1 


because both p and any operator of the form CTC are positive semidefinite. 


Now, if P € Q, we can consider the matrix rı =I'(o,7 1) for the state pọ and the 
collection of Mam, + Mpmgp projectors 


Fy = {I [Tts € ¥,a € A}, [Piy ey, be B}}. (6.8) 


If we knew the state and the measurements, we could compute all the entries of r1. In 
a device-independent approach, however, we don’t know anything a priori, we are just 
assuming that the state and the projectors exist. Still, the value of some entries of I, is 
determined by the behavior, namely Tr(o IZTER) = P(a,b|x,y), Tr(0oTI%) = P(a|x), and 
Tr(p IE) = P(b\y). The other entries, involving two operators of Alice’s or two of Bob’s, 
do not correspond to anything observed in a Bell test, but Lemma 6.1 guarantees that 
they can be filled so as to have [1 > 0. A contrario, suppose that after filling the suitable 
entries of I; with the behavior, one were to find that there is no way to fill the other 
entries to obtain T1 > 0: Then surely P ¢ Q. 

Clearly, we have just described a necessary condition for a behavior P to belong to the 
quantum set Q. For some parts of the quantum set, this condition is already sufficient: For 
instance, the TLM boundary (6.5) can be recovered, see subsection 6.3.3. In general, 
this condition is just the first of the NPA hierarchy of criteria (Navascués et al., 2007, 
Navascués et al., 2008). The set of behaviors that satisfy rı > 0 is denoted Q4. 

For the next step of the hierarchy, one constructs the matrix T2 for the collection 


Fy=F,u{{nans|, {ram {nary} (6.9) 


where it is understood that all the new (Mama)? + (Mpmp)y” + Mam4Mpmg operators 
should be added to F1. The set of behaviors for which F2 > 0 is denoted Q2. The 
positivity of 2 is a necessary condition for P € Q, tighter than the positivity of Tı but 
once again not sufficient in general. Notice that none of the entries involving operators 
like Ten, will be determined by the behavior; nonetheless, such terms enforce some 


structure onto the matrix: For instance, Tr(p (I1*I*,) M*,) = Tr(o TETE) Sv a”. 
The reader has probably guessed rightly that the n-th step of the hierarchy involves 


constructing the matrix I, for the collection of all products of n or fewer projectors. 
The set of behaviors for which F, > 0 is denoted Q,. Clearly, Q1 2 Q2 2 Q3 >... and 
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Qn > Q' for all n. NPA proved that this hierarchy is convergent, i.e., limp oo Qn = Q’. 
Notice that the convergence is proved for Q’ and not for Q. Also, in many cases the 
quantum result is already obtained for a finite n. 

The elegance of the hierarchy should not conceal the fact that Lemma 6.1 allows for 
full freedom in choosing the collection of operators F. For instance, in section 10.5 we shall 
discuss an interesting role for the set of behavior Q1+4pB, obtained by only adding to Fı 
the operators of the form TISTE. Also it is good to keep in mind that adding a single 
operator to an existing F may be enough to obtain a tighter bound. We shall use the 
notation QF for the set of behaviors such that T > 0 for the collection of operators F. 

The fact that the NPA conditions are semidefinite requirements I > 0 is particularly 
appealing because it suggests the possibility of using them in semidefinite programs 
(SDPs). These are the next-to-simplest instance of convex optimization, after the linear 
programs that we encountered in chapter 2. Just as with linear programs, given a SDP, 
a dual SDP can be algorithmically constructed, and one can obtain simultaneously a 
lower and an upper bound to the desired solution. For more details on SDPs we refer to 
Appendix E and to the book of Boyd and Vandenberghe (2004). Now we are going to 
see three examples of such SDPs, more will be presented in the next chapters. 


6.3.2 Example 1: Membership in the quantum set 


Lets describe how the NPA techniques address the problem of membership, i.e., 
assessing whether P € Q. For any choice of F, one can relax the membership problem 
to the following: 


A =maxid 
subject to T-—AI>0 
rixa) = P(alx), To, b) = P(oly), V(x, a)y, b) = P(a, b|x, y) (6.10) 


with the obvious definition of the entries T (x, a), I y, 4) and T(x, a)(y, 8) of T. The variables 
of the optimization are à € R and the other entries of I’. These can also be taken real 
without loss of generality, because the constrained entries are real. Indeed, suppose that 
one finds possibly complex F > 0 which satisfies the constraints: Then its complex 
conjugate Trot satisfies the constraints too; and therefore, the real part IT = (sol +r g oD 
satisfies the constraints as well. 

If the algorithm yields A < 0, it is impossible to find I > 0 that satisfies the constraints: 
Then certainly P ¢ Q. If A > 0, Pe QFD Q, so we are not sure that P € Q. The 
assessment can be tightened by choosing a larger collection F. But the only conclusive 
way to prove that P € Q is to exhibit an explicit realization of the behavior in terms of 
state and projectors (having which, the SDP becomes superfluous). 


6.3.3 Example 2: The TLM bound 


Suppose that the verifier has performed a (2, 2; 2, 2) Bell test and estimated the 
correlators (Eoo, E01, E10, £11). For these correlators to belong the quantum set, there 
must exist a state and projectors such that 
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Ew = Tr[o(Z_, ) — T2__)(R_4, — M__))] = Tro FF). (6.11) 


For the collection of operators F = (FA, cere FB, FB) we have 


1 u Eoo Eo 
uy 1 Eo En 
Eoo Fio 1 u2 
Eo Fir uw 1 


(6.12) 


The diagonal elements are 1 since F? = I, due to the orthogonality conditions I% M%, = 


ôa a 11% and the same for the IF. Once again, it is not restrictive to assume u, u2 € R 
since all the constrained entries are real. The condition for T > 0 reads Arcsin(Eo0) + 
Arcsin (E01) + Arcsin (E10) —Arcsin (E11) < 7, as can be proved in two different ways 
(Landau, 1988; Navascués et al., 2008). Thus the TLM condition (6.5) is an upper 
bound to the set of quantum correlations. Since we know it can be reached, it is right on 
the boundary. 


6.3.4 Example 3: Maximal quantum violation 
of a Bell inequality 


Finally, let us consider the problem of finding the maximal violation of a Bell inequality 
in quantum theory, which was actually the first application of SDPs in the context of 
nonlocality (Wehner, 2006). 

Assuming that the inequality reads J < Iz, the problem we want to solve is to compute 


Io =maxI(P) 
subject to Pe Q. (6.13) 


Unless one finds analytical arguments like the Tsirelson bound, this problem cannot be 
solved efficiently: At best, one can resort to heuristic optimization over a class of states 
and measurements, i.e., over a subset of Q. Because of this, or because the optimization 
itself may have converged to a local maximum, the outcome satisfies Iheur < Jo. 

The NPA techniques allow relaxing the program (6.13) to the SDP 


Ir =maxI(P) 
subject to I'(P) > Ofi.e., P € QF]. (6.14) 


In this SDP, all the entries of I are variables. In order to obtain a non-trivial solution, all 
the P(a, b|x, y) that appear in J(P) must correspond to an entry in I’: This defines the 
minimal collection F. The SDP outputs both an upper and a lower bound on Ir, that 
usually coincide within numerical precision. Now we are guaranteed to have Ip > Io, 
because we are optimizing over a superset of Q. The larger the collection of operators 
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F, the closer Ip will be to the actual Jọ. Eventually, if Ir is found equal to an explicit 
achievable quantum value (a numerical heur or the result of some calculation), then we 
have found Ig. This is how, in chapters 4 and 5, we could confidently cite the quantum 
maximum of several inequalities. 


EXERCISES 


These exercises involve running SDPs on a computer. Packages can be found online for 
all the major mathematical softwares.’ 


Exercise 6.1 As we shall see in section 7.1, the only quantum behavior that achieves the 
Tsirelson bound S = 24/2 is the behavior (3.11), characterized by (Ax) = (By) = 0 and 
(A,By) = (-1)° I In this exercise, we propose to rediscover this result by running SDPs. 
For instance, to prove that (Ax) = 0 1s the only solution, you need to run two SDPs, those with 
objectives max(A,) and min(A,). 
(a) Run your checks at the level Qı of the hierarchy, i.e., for a moment matrix built on 
F = {1, Ao, A), Bo, By}. Keep all the entries of the moment matrix as variables (apart 
from the entry that is 1) and impose only the constraint S = 24/2. You should find that 
this level of the hierarchy 1s not enough to certify uniqueness. 
(b) Repeat for the level Q)4,4p of the hierarchy, i.e., for a moment matrix built on F = 
{1, 4o, 41, Bo, B1, Ap Bo, ApoB, A; Bo, A; By}. This time, you should be able to prove 
uniqueness (up to numerical precision). 


Exercise 6.2. Using the NPA method at the level Q1, prove that the quantum minimum for 
Mermin’s outreach Bell test (subsections 1.4.1 and 4.2.2) 1s O = 4.5. 


7 A CVX code to implement NPA is available in Qetlab (www.qetlab.com). 
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Device-Independent Self-Testing 


You can observe a lot by just watching. 


Yogi Berra 


Our first contact with DI certification is device-independent self-testing, that was 
mentioned a few times in chapters 3-5: In a sense to be made precise in this chapter, 
some behaviors are possible only with a unique choice of state and measurements. 


7.1 Self-Testing of the Maximal Violation of CHSH 


As a first case study, we are going to show that S=2,/2 for the CHSH test can 
only be reached by measuring a two-qubit maximally entangled state with projective 
measurements in mutually unbiased bases. We follow (Popescu and Rohrlich, 19926); 
the same result can be identified in previous works, in a more abstract mathematical 
framework (Tsirel’son, 1987; Summers and Werner, 1987). 


7.1.1 The proof 


We resume from the proof of the Tsirelson bound (subsection 3.2.2), working with 
projective measurements without loss of generality since the dimension of the Hilbert 
space is not fixed. In this case, the operators Ap, 41, Bo and Bı have eigenvalues —1 and 
+1; and we saw that S = 2,/2 is only achievable if 


Tr (p[Ao, 41] ® [Bo Bi]) = —4. (7.1) 


Given that ||[4o, Ai]|loo < 2 and ||[Bo, Bi] [loo < 2, this implies that [4p, A;] and [Bo, 
Bı] must have, in some subspaces, the eigenvalue +272 or —27. Lets denote those 
subspaces by their projectors fg In all those subspaces, the anticommutators {Ao, 
A} and {Bo, Bı} have the eigenvalue 0, since [Ag, 41]? = {40, 41}? — 4I holds using 
AG = Ae =I. 

Besides, the support of o must be in the right subspaces for (7.1) to hold: 


p=% (4 @ TT?) p(m4 @ nP). (7.2) 


s=a 


Bell Nonlocality. Valerio Scarani. © Valerio Scarani 2019. Published in 2019 by Oxford University Press. 
DOI: 10.1093/0s0/9780198788416.001.0001 
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To keep the notation simple, in what follows we restrict A, and B, to those subspaces, 
i.e., Ay actually stands for >, 144,14 and B, for >,_, 1?B,1%; the more general 
notation will be re-introduced in the next subsection. 

Having noticed this, the following can be proved using Jordan’s lemma (see Appendix 
G.4) or in other ways (Kaniewski, 2017): If there is a subspace of {40o, 41} associated 
to the eigenvalue 0, it must be of even dimension 2d” 4; besides, there exists a choice of 
basis in which 


s=1 


Ao = 0; Q lar 4 and A, = 02 Q lyg. (7.3) 


The same of course holds on Bob’s side. Let’s now plug the expressions for the local 
operators for the measurements into the Bell operator (3.4). To simplify this step 
we choose Bob’s basis such that Bp = F(z +62) Q Iq, and By = AG — 62) Q larp- 
Then S = V2 (0; Q 03 +0: Q 0%) Q Is; 8 Ig, and Tr(Sp) = 24/2 can be achieved if and 
only if (03 ® oz) = (og Q og) = 1. This requirement uniquely identifies the Bell state |o+). 
Therefore we conclude 


p=|e*)(o*|@6, (7.4) 


where ô can be any state. 

In words, we have established that, in order to achieve S = 2/2, Alice and Bob must 
share a two-qubit maximally entangled state and perform projective measurements in 
mutually unbiased bases. This is self-testing of both the state and the measurements. 


7.1.2 Consolidation 


Having proved the self-testing character of S = 2/2 in a series of inferences, let us 
summarize the result with a more precise notation. Consider the following decomposition 
of Alice and Bob’s Hilbert spaces: 


Hy, =[(Ha ® Hag] D Har (7.5) 
Hp — [Hp ® Hp] @ Hp (7.6) 
where dim(Hx) = dx unspecified, dim(H.x) = 2, dim(Hx) = da" < dg/2 but other- 


wise unspecified, and dim(H x) = dg — 2d" g, for K = A, B. 
Then up to local unitaries the state reads 


pas =|*)(O*| 8 arg" (7.7) 


with no support in #4” and Hg. Notice that ô is arbitrary and could be entangled. 
With the same choice of basis, the measurement operators read 


I+aa,-o 2 
ni=[(=) otr |e (ñ), (7.8) 
A' 
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I+ bb,- ō : 
y y i y 
n=|( 5 ) on l e (5) (7.9) 


where the I are arbitrary and where 


(7.10) 


âo = 3, a, = X3 bo = 


This is what is meant by saying that S = 2/2 self-tests the maximally entangled state 
of two-qubits and mutually unbiased measurements on it by both players. As a conse- 
quence of self-testing, there is only one quantum behavior that reaches S = 2/2, 
namely (3.11). 


7.1.3 Two worked out examples 


For pedagogical reasons let us consider two states that lead to S=2/2 and show 
explicitly that they can indeed be written in the form (7.7). 
The first example is a state of the form 


d/2 ; ; TO 
|27 — 1) 127 — 1) + |27) |27) 
Iw) => VB (7.11) 
j=1 : v2 


with d even. The fact that S = 2./2 can be achieved with this state is a particular case 
of the calculation done in section 3.2.4 with the state (3.20). By recording the parity 
of the ket label in a qubit and the value of j in a (d/2)-dimensional subsystem, i.e., 
27-1 Ke = |l) @l) x and |27)c¢ = |0) Q ly) ev for K = A, B, the state is indeed cast 
in the form (7.7) 


d/2 
1Y) = (|®*)) yg ® > Vili) an Ve |- (7.12) 
j=l 


We started from a pure state (7.11) that is a superposition of singlets in different 
subspaces; but the same isometry could have extracted a singlet in A'B’ even if we 
had started from an incoherent mixture of those singlets. Thus, one can violate CHSH 
maximally with a mixed state, as first observed by (Braunstein et al., 1992). A similar 
case study is proposed as Exercise 7.1. 

As a second example, we consider the Smolin state of four qubits 


4 


1 
pagon = 7), (More+ + Mo-o- + Mytwt + My-w-) (7.13) 
k=1 
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with! Nyy = (Y) (wDaBdy) (W)eop. If the four players can only apply local operations 
and classical communications, this is a “bound entangled” state. However, it was 
observed that if BCD = E are treated as a single player holding a 8-dimensional system, 
one can reach S= 2/2 with measurements on this state (Augusiak and Horodecki, 
2006). 

In order to bring the Smolin state to the form (7.7), we recall that |t) aB =1® 
Or =I @ox ane = [IQ (205) a ee Therefore, E can apply the unitary 


Og 
Up=10 |®*)(0*| +038 |07)(07] 
+ 0, @|Wt)(WT] + Cos) @ |W) (W | (7.14) 
where the Bell basis is on CD. Then 


1 
UzpascpU}, = (e+e Dase (31) ; (7.15) 
CD 


7.2 The Mayers-Yao Self-Testing Behavior 


Our second case study follows the work of Mayers and Yao (1998; 2004) in the context 
of quantum cryptography. The original work was in the (3, 2; 3, 2) Bell scenario, but we 
present a variation in the (2, 2; 3, 2) scenario (McKague et al., 2012). The outputs are 
labeled a, b € {—1, +1}, so we can parametrize 


P(a,b|x,y) = — (1 + a (Ax) + 6(By) + ab(A,By)) . 


| 


The MY behavior is defined by 


(Ax) = (By) =0, x € {0, 1}, y€ {0, 1, 2} (7.16) 
(AoBo) = (A1B1) = 1 (7.17) 
(AoB,) = (A; Bo) = 0 (7.18) 
1 
AoB2) =(A,B2) = —. 7.19 
(ApB2) = (41 B2) aa (7.19) 


l Notice that p remains the same if we were to use Myy =(W)(W)acdv) (vDap or Myy = 
(Iw) Dan) (WD Bc- 
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This is the behavior that one obtains by measuring the state |t) with Ao = Bo = 03, 
A; = Bı = o; and B2 = 5 (03 + 03). We need to prove that this is actually the only 
possible realization, in the sense of self-testing. 

As a sanity check, notice that the MY behavior is nonlocal:? It violates CHSH as 
(ApBz) + (ApBo) + (41 Bz) — (A, Bo) = V2 + 1 > 2. It’s also easy to check by inspection 
that this is the highest violation of CHSH achievable by combining the statistics in the 
behavior: Since /2+ 1 < 2V2, this form of self-testing is genuinely different from that 
of the previous section. 


7.2.1 Inferences from the behavior 


Just as in the previous section, we work with projective measurements without loss of 
generality. Also, here the proof is more transparent by working with a pure state (without 
loss of generality, as explained in subsection 6.2.1). 

Thus we read the (A,By) as (Y |4xByl y). The condition (7.17) can be rewritten as 


Ao|¥) = Bo|W) and 41 |Y) =B, |Y). (7.20) 


Inserting this into (7.18), we find (Y¥|4041|¥) = 0, that is A; |W) is orthogonal to Ao |Y). 
If this is the case, then (7.19) means that 


Ag+ Ay 


B2|¥) = PF 


|W): (7.21) 


indeed, Bz2|W) must be of norm 1, and we know already two of its projections of 
amplitude Z on two orthogonal vectors. The last preparatory step consists in computing 


B? |W). On the one hand, we know that B? = Ig,; on the other hand, using [A,, By] = 0 
we obtain? 


B3|v) = Bo 


Ao +41 Ao+4ı (a 
y) = B = | ——— Ww 
Ga |W) Wa 2 |W) IG: |W) 

1 


= zo + Al + (Ao, Ai}) |). 


2 Without Bob’s third input y = 2, the behavior becomes local. That behavior self-tests the singlet if one 
assumes a priori the systems to be qubits (Appendix F.1.3). 

3 Notice that, from an equation of the form C |Y) = D|W) like (7.21), the inference C2 |W) = D2 |W) follows 
automatically only if C|W) « |W), which we cannot guarantee in a device-independent setting (and in fact, it 
would be wrong if one replaces the ideal states and operators). This is why we have to prove that C2 |W) = D? |W) 
holds here, by using the commutation relations. 
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Since Ae = 4? = Iq,, we find finally 


{do A1}|¥) =0 28 {Bo, By} |W) =0. (7.22) 


We can now concentrate on inputs x, y = 0, 1, Bob’s third measurement y = 2 has 
fulfilled its role. In fact (7.22) are the same conditions as we found in the previous 
section by reasoning on the algebra behind S = 2V/2. At this point, we could finish the 
proof by following the same pattern: Invoke Jordan’s lemma and solve an easy two-qubit 
problem. Instead, we take a more complicated approach, which however can lead to 
generalizations—in fact, it anticipates the formal definition of self-testing that will be 
given in subsection 7.3.1. 


7.2.2 Swapping the relevant qubit into a controlled one 


Assume self-testing indeed holds: On Alice’s side there is then one qubit that is maximally 
entangled with one on Bob’s side, and the operators Ag and A will act non-trivially only 
on the qubit support of the state. We can now imagine the following virtual protocol: Alice 
prepares a controlled qubit A’ in a dummy state, say |0), then couples it to her system in 
the attempt of swapping the state of the relevant qubit and that of the controlled qubit. 
If Bob does the same and the swap is correctly implemented, at the end of these local 
operations we expect to find the state |t) ypg 1n the controlled qubits. 

A possible construction of the operator that swaps the state of two qubits is shown 
in Figure 7.1, left. For the system under study, we do not have the Pauli operators, but 


Figure 7.1 Left: Quantum circuit representation of the swap gate for two qubits. Left, top: The usual 
representations as three alternate CNOT gates: For a given control c and target t, the gate is defined as 
Ua = |0) (08I +11) (1| 8 og, i.e., the gate acts non-trivially only if the control is in the state |1). Left, 
bottom: An equivalent version of the central CNOT gate, where H is the Hadamard unitary gate defined 
as usual: H|0) = 35 (10) + ]1)), H|1) = -5 (10) —|1)). Right: The corresponding isometry in the 
context of Mayers- Yao self-testing. Having chosen the auxiliary qubits A’ and B’ in the state |0), the 
first controlled gate will act trivially and can therefore be dispensed with. In the remaining two controlled 
gates, ox 1s replaced by Ao (Bo) and ox is replaced by A, (B1). 
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we surmise that Ag and Bo act as oz, A; and Bı as ox, on the relevant qubits. So the 
virtual protocol is implemented as the local isometry ® =® 4 ®®pz represented in Figure 
7.1, right. Finally, let us show that the virtual protocol just defined implements indeed 
the swap of the relevant qubits if (7.20) and (7.22) hold. Omitting the tensor product 
symbol for simplicity: 


1 
® |W) 43100) ap = z [[Ad+40)d+ Bo) |W)] 100) 


+ [41B) d- Ao) -— Bo) |¥)] |11) 
+ [Bı d +40) -— Bo) |'¥)] 101) 
+[4: d- 40) + Bo) |¥)]110)]. (7.23) 


First, using (7.20), we replace Bo with Ap, and this cancels the third and fourth 
lines because (1+Ap)(I— Ao) = I-A? =0. Then, using (7.22), one proves that 
A,B,(1—Ao)T— Bo) |W) = 7+ Ao) A+ Bo)A1 Bi |W), which is in turn equal to 
(I+ Ao)(I1+Bo)|W) because of (7.20). Finally (I+ Ao)(I1+ Bo) |W) = A +40)? |W) = 
2(1+ Ao) |W). So we have found 


1+Ao 
J2 


|) 48100) aR = ( w) dt) yp (7.24) 
AB 


which is the self-testing of the state. 

The proof for the measurements follows the same steps, starting with ®A,By|WV) 4p 
|00) yp instead of (7.23). Let me show it for one of the six cases (for clarity, I underline 
the operator under study): 


1 
PA1 |) 48100) a a [[d+40) + Bo)A1 |¥)] 100) 
+ [4,B,(1— Ao) — Bo)A1|¥)]111) 
+ [Bı A +40) -— Bo)41 |¥)] 101) 
+41 — Ap) (1+ Bo)A1|¥)] |10)]. 
By using (7.22), one moves A; to the left in the first, second, and fourth lines while 


changing the sign of Ag, and B; to the right in the third while changing the sign of Bo. 
After simplifications similar to those made previously, the final result is 


1+Ao 
OA, |W 0) ap = Y s@ll|ot . 7.2 
A, |¥)4gl100) 4B ( J | )) (as | Ns (7.25) 


Notice that the state left in the AB system is the same for both the self-testing of the state 
and of the measurements. 
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7.3 Formal Definition and its Consequences 


7.3.1 Formal definition of self-testing 


Having gained confidence with self-testing by dealing with two examples, we can now 
state the formal definition of self-testing. We write it for bipartite Bell scenarios, the 
extension to multipartite scenarios is obvious. 

The behavior P self-tests the state |v) and the families of measurements (tl € 
X,a € A} and (1, y € V, b € B} if the following holds: For every quantum realization 
P(a,b|x, y) = Tr(o TIZI) of the behavior, there exist a local isometry ® = 4 QÖpg 
such that 


d[o] =|v)(v| xB 8 aB (7.26) 
amt] =[(ñ)} @lv]@ (hi (7.27) 
[TH] =[() » 81g] © (TR) p (7.28) 


where the tilde operators are arbitrary. The isometries have been defined as 9x: Hx > 
(Hx Q Herr) BHR”, K = A, B, where the dimension of Hg is fixed by the state and 
measurement that are self-tested, while the dimensions of Hg” and Hx are arbitrary. 

Even if it should be clear, let me stress that an experimental demonstration of self- 
testing does not require setting up the local isometries. It is sufficient to perform a normal 
Bell test and record the behavior, all the rest are mathematical inferences drawn from that 
observation. Let me also insert two technical remarks: 


e The two examples of proofs that we presented differ in one respect: In section 
7.1 K' and K” appeared as subsystems of the original K, whereas in section 7.2 the 
K’ are auxiliary systems and the K” are the original systems K. Only in the former 
case self-testing can be described as “swapping out of the box” the relevant degrees 
of freedom. For the formal definition of self-testing, we retained the more general 
definition of isometry, which is the second.4 


e Given a quantum realization of a behavior, another realization related by any 
isometry will give the same behavior, not just a local one. Besides, if one were to 
identify subsystems with algebras of commuting observables (see subsection 6.2.2), 
it would be hard to define “local” isometries, because [A, B] = 0 implies [®[A], 
®[B]] = 0 for any isometry. In fact, if the goal is to define the set of behaviors 


4 The most striking example of the power of auxiliary systems is a very counter-intuitive result in quantum 
channel theory. One would expect an ancilla prepared in the maximally mixed state to describe a source 
of classical randomness; thus, channels that use such an ancilla should be equivalent to mixtures of unitary 
operations on the system. But this is not the case: There exist unital channels that cannot be represented as 
convex mixtures of unitary operations (Haagerup and Musat, 2011). 
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that lead to self-testing, the locality requirement can be dropped from the isometry. 
However, locality must be enforced if self-testing is seen as DI certification of given 
black boxes, because under global isometries the state that gives the behavior could 
be in the auxiliary systems. Besides, if not based on locality, the assessment of [A, 
B] = 0 requires characterization of the devices. 


7.3.2 Which behaviors, and which states 


Next, we are going to see which behaviors may lead to self-testing, and which states can 
be the object of self-testing [see (Supić and Bowles, 2019) for a more comprehensive 
review]. 

It was proved that only behaviors that are extremal points of Q may lead to self-testing 
(Goh et al., 2018). One may conjecture that all the extremal points of Q in any Bell 
scenario self-test some state, but there is no proof for this. Recall that several behaviors 
may self-test the same state, as we have seen the two-qubit maximally entangled state 
being self-tested by S = 2/2, by the Mayers-Yao correlations, and in fact by all the 
nonlocal points on the TLM boundary (6.5) (Tsirel’son, 1987; Wang et al., 2016). 

It follows from this that mixed state cannot be self-tested: Behaviors obtained from 
them are not extremal in the quantum set, since they can be written as the convex mixture 
of behaviors obtained from pure states with the same measurements. This observation 
motivates a posteriori our defining self-testing with pure states.” 

It is known that all pure bipartite entangled states can be self-tested (Coladangelo et al., 
2017a). Several examples of self-testable multipartite pure entangled states are known too: 
The GHZ states and in fact all graph states, the three-qubit |W’) state and several of its 
generalizations, and others [see (Supić et al., 2018) and references therein]. For generic 
multipartite states, however, self-testing may have to be defined in a more careful way, as 
there exist states that are not equivalent under local unitaries but whose state-behavior is 
identical.® 


7.4 Approximate Self-Testing: Robustness Bounds 


As we have just seen, self-testing per se is a property of extremal behaviors and pure states. 
These are theoretical ideal cases. For self-testing to enter the realm of device certification, 
one must be able to make statements on any observed behavior. For instance, if we 
observe S = 2/2 — e, we should be able to estimate how close the actual state is to the 


5 Besides, a behavior that can be obtained with a mixed state can also be obtained with its purification, using 
measurements that act trivially on the purifying degrees of freedom. Thus, as a corollary, we know that if a 
behavior self-tests a pure state |v); the systems A’ and B’ cannot be further decomposed in sub-systems, such 
that the measurements would act trivially on one of them. 

6 If a behavior can be obtained with state |v) and some measurements, that same behavior can also be 
obtained with the state |y)*, defined as taking the complex conjugation of the coefficients in some basis, 
and the similarly conjugated measurements. For bipartite systems, |y) and |y)* are always equivalent under 
local unitaries, the canonical form being the Schmidt decomposition whose coefficients are real and positive. 
However, this equivalence no longer holds for any systems composed of three or more subsystems (Acin et al., 
2001). So, at best, one can hope to self-test such states up to complex conjugation. 
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ideal one. Such an estimate is called a robustness bound. Let us review quickly some of 
the results, referring to the original works for more details. It must be kept in mind that 
all robustness calculations rely on choosing a form of the local isometry ® with which to do 
the calculation. By default, one chooses the isometry of Figure 7.1, even if it won’t be a 
swap in the non-ideal case and other isometries may be found that give better bounds 
(Bancal et al., 2015). 


7.4.1 Generic analytical bounds with poor robustness 


The figure of merit for robustness was initially defined as (McKague et al., 2012; 
Reichardt et al., 2013) 


A= |Y) — |v) ap @ |W) aval (7.29) 


where the unknown state is supposed to be pure. The calculation amounts at enchaining 
inequalities (triangle, Cauchy-Schwarz ...) and can be applied in principle to any 
example of self-testing; the resulting bounds are analytical expressions. The problem 
of these bounds is their very poor tolerance. For instance, for S = 2/2 — e, the estimate 
of (McKague et al., 2012) yields 


A<ll (ev2) + 10(ev2) (7.30) 


As a benchmark, let us consider the situation in which the isometry does not couple the 
auxiliary systems to the unknown system, that is if |W) = |00) ypg ® |W) arp. For this 
state one finds A = y 2 — V2, which is reached already for ¢ © 1.8 x 1075. For higher 
values of £, this criterion cannot guarantee anything non-trivial. While this is sufficient 
to establish robustness in principle, much better tolerance is needed to make meaningful 
self-testing claims on experimental data. 


7.4.2 Generic SDP techniques 


The desired better tolerance has been obtained by resorting to SDP optimizations based 
on the NPA hierarchy. Here, the figure of merit should be linear in the behavior in order 
to play the role of objective function in the SDP. The natural choice is to minimize the 
fidelity F = (y |0.4'B' | Y) of the reduced state of the auxiliary systems with the ideal output 
state. Given the expression of F, the SDP needs to find the minimal value of F compatible 
with the observed behavior and for some choice of the moment matrix I’. The fidelity F 
is a sum of terms, some of which may be determined by the behavior, while the others 
must appear as semi-definite variables in the moment matrix. Still, the choice of adds 
to the choice of the isometry in making the bound possibly not tight. 
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For the sake of an example, let us consider the Mayers-Yao criterion. We can find the 
expression of the fidelity starting from (7.23): 


1 
F =||—~ [(1+ Ap) + Bo) + 41B1 (1 — Ap) (1 — Bo] IY)? 
ls [1+ Ao)(1+ Bo) + A1Bi( 0)( oJ IY) 


1 
=g [4 + 4 (40B0) + (41B1) + ([40, Ai] [Bo; B1]) 
— (A; (BoB1Bo)) — ((404140)B1) + ((404140)(BoB1B0))] (7.31) 


where (C) = (W|C|W). The values of the terms (A4gBo) and (41Bı) are determined 
by the behavior; the other terms are variables. For the CHSH criterion, the expression 
of the fidelity is different (Exercise 7.2). If in the SDP the only constraint coming from 
the behavior is S = 2/2 — e, the benchmark value F = 5 compatible with the output 
state |00) ypg is reached for € ~ 0.4. This tolerance may be improved by using the whole 
behavior as constraint, as well as by changing the isometry (Bancal et al., 2015). 

This approach through SDPs can also be applied in principle to any example of self- 
testing. However, moving away from the basic examples, the number of terms in the 
expression of F grows quickly, and so does the SDP size. Besides, if the natural operators 
that define the correlations are not those that define a swap and/or do not guarantee 
unitarity, additional semi-definite constraints must be used. In practice, the self-testing 
of the maximal violation of the CGLMP inequality for m = 3 (Bancal et al., 2015), or the 
self-testing of two singlets using the Magic Square game (Wu et al., 2016), are already 
very cumbersome and computationally heavy. 


7.4.3 A highly robust analytical result for specific cases 


Analytical bounds with even better tolerance than the SDP method have been obtained 
more recently for Bell scenarios with two inputs and two outputs per player (Kaniewski, 
2016). For the bipartite case, which is CHSH, the lower bound on the fidelity is given by 


1 i “Sars 16+ 14/2 
F(S) = -+ —_———__,, with S* = —__“—" 2.11. 7.32 
(S) ERS wi 7 (7.32) 


In other words, the value F = 5 is reached for ¢ = 2/2 — S* ~ 0.72. At the moment of 
writing, we do not know if this bound is tight, that is, if there is a family of states that 


saturates it (Exercise 7.3). 


EXERCISES 


Exercise 7.1. The “linear cluster state” of N qubits is a pure state that is not product in any 
partition; in particular, all n-qubit partial states are mixed unless n = N. For N > 5, the partial 
state of any five consecutive qubits is such that the following expectation values hold: 
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(IZXZI) = (ZYYZI) = (IZYYZ) = —(ZYXYZ) = +1 (7.33) 


where I = I, Z = o; and X =o;. 
(a) Prove that the set of equations (1.33) defines a GHZ-type argument for nonlocality: 
Thus, one can have GHZ arguments for mixed states (Scarani et al., 2005). 
(b) With a suitable grouping of some of the players, this becomes the usual 3-partite GHZ 
argument, which is known to self-test the state |GHZ3). Prove that, under local isometries 
Jor that grouping, the equations (7.33) are equivalent to the set of eigenvalue equations 
that define that state (5.14). 


Exercise 7.2. Prove that, for the usual choice of isometry (Figure 7.1), the singlet fidelity for 
self-testing using CHSH is given by 


1 
F= ls [(c+s)(I+Ap)(I+ Bo) + (c — s)A1B1 (1 — Ao) (I — Bo)] IY)? (7.34) 


with c= cos % and s= sin Ș. In particular, it is different from the Mayers-Yao expression 
(7.31). Hint: The isometry assumes that Ay = Bo = oz and A; = Bı = o; in the ideal case. 
For this choice of operators, the state that gives S = 2/2 is c |+) +s |97]. 


Exercise 7.3. Suppose that the state being measured is a two-qubit Werner state (3.23). 
Compare the singlet fidelity that one would obtain after characterizing the state with the DI 
self-testing bound (7.32) computed assuming that the observed S is the maximal value of 
CHSH achievable with that state. 


8 


Certifying Randomness 


Esse igitur contra rationem providentiae et perfectionis rerum si non essent aliqua 
casualia. 


So, it would be contrary to providence and to the perfection of things if there were no 
chance events. 
Thomas Aquinas, Summa contra Gentiles 


If nonlocality is observed, the outputs of the process did not pre-exist—in other words, 
the process is random. In this respect, the quantification of the amount of generated 
randomness is the most natural example of device-independent certification. This 
chapter is devoted to it. 


8.1 Introduction to Randomness 


The word “randomness” evokes notions that span across metaphysics, anthropology, his- 
tory, mathematics, and computer science: Chance, free will, gambling, secrecy, statistics, 
zero-knowledge proofs ... Delving into these cultural riches is exciting, but would distract 
us from the topic of this book. Our much more modest goal of this chapter is to discuss the 
role of nonlocality in certifying randomness. Even in this restrictive setting, a few concepts 
must be introduced. For two alternative introductions to the same topic, the reader can 
consult the review articles (Pivoluska and Plesch, 2014; Acin and Masanes, 2016). 


8.1.1 Product and process randomness 


First of all, randomness is studied in the context of processes that produce strings of numbers. 
One speaks both of random processes and of random numbers. The corresponding notions 
of randomness are called process randomness and product randomness (the randomness of 
the “product,” that is, of the string of numbers). 

We expect these two notions to be tightly connected: Random processes are often 
called random number generators (RNGs). Nonetheless, they are different, and there are 
situations in which the difference is clear. On the one hand, a random process may 
occasionally output a number that nobody would call random, for instance a string 
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of zeros. On the other hand, reading pre-recorded information would not be called a 
random process, even if the pre-recorded information is a random number. 

Intuitively, a string of numbers will be called “random” if no pattern can be detected 
in it. Thus, product randomness 1s lack of pattern. Formalizing this intuition has proved 
challenging. Most of today’s definitions are variants of the pioneering proposal of Per 
Martin-L6f, who in 1966 defined the randomness of a sequence in the language of 
algorithmic complexity. Without entering into the details, one ignores the actual process 
and defines randomness on a class of simulators, which are the algorithms that could have 
produced the sequence. In practice, product randomness is assessed through a battery 
of statistical tests. The product randomness of the outputs of a Bell test can of course be 
studied, but it doesn’t have any special feature. 

By contrast, a process is called “random” if one cannot predict how it will unfold. 
In other words, process randomness is unpredictability. Now we can make more precise 
the tight connection between process and product randomness: Unpredictability is not 
the same as lack of regularity, but a process that most of time produces a highly regular 
output will also be highly predictable. Conversely therefore, an unpredictable process will 
most likely produce a sequence that lacks regularity. Importantly, process randomness 
will almost always imply product randomness,! while the converse does not hold, as the 
example of the pre-recorded information shows. The reader can now guess which is 
nonlocality’s unique contribution: The possibility of certifying process randomness in a DI way, 
that 1s, without characterizing the process itself. 

The definition of process randomness calls for a predictor. The question “random for 
whom?” must always be addressed and is crucial to avoid confusions. Having noticed this, 
positing a proper formalization of process randomness is immediate: Randomness will 
be quantified by the probability of the predictor guessing the output. 


8.1.2 Random for whom? The predictor 


In a book devoted to nonlocality, defining the predictor may look like an easy task: Have 
we not proved that the outputs of some natural processes are random for everyone? 
A quick thought back to section 1.6 shows that “everyone” is rather ill-defined. For 
instance, any deterministic interpretation will be able to describe a hypothetical agent for 
whom there is no randomness. In the Bohmian interpretation, that would be an agent who 
has access to both the initial conditions and the quantum potential with infinite precision; 
in the many-worlds interpretation, an agent who perceives the whole multiverse and not 
just one of its branches. Also, there can’t be randomness for a hypothetical agent who 
sees what for us is “the future” and has therefore nothing to predict. So, a definition of 
the predictor is indeed needed. In this subsection and the next, we go through a list of 
characteristics, that will be summarized in Table 8.1. 


1 A formalized version of this statement is called Yao’s theorem, see Theorem 11.9 in (Arora and Barak, 
2009). 

2 This does not seem to prevent some people, who believe exactly that of God, from asking me quite often 
if quantum phenomena are random for Him. 
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Table 8.1 The characteristics of the adversary Eve in this text. They capture a scenario of generation 


of secret randomness inside a secure location. 


Characteristics of Eve 


Remarks 


Is outside the secure location and has not 
tampered with anything inside it. 


Knows with certainty all that happens inside 
the secure location, apart from the outputs of 
the process under study (whose randomness 
has to be assessed) and possibly the content of 
a string of fixed finite length. 


Has unbounded computational power and 
laboratory facilities 


Has the most precise description of the process 
under consideration using quantum theory 


Does not have quantum side-information. 


The fixed-length string is needed in the 
framework of randomness expansion and for 
QKD (section 8.4). 


That is, she can run perfect simulations of the 
process, either digital or analog. 


In particular, she may have a pure-state 
description of each round of the process. 
Further relaxed in section 9.3. 


Natural in the case of a single secure location. 


Most of the specialized literature gives 
quantum side-information to Eve (references 
in Appendix E2.1). 


First, we assume that the predictor knows a perfect description of the process and may 
run perfect analog or digital simulations with unbounded computational power. Such a 
predictor is not limited by what we describe as ignorance: If we describe the process with 
a mixed state, she may have a pure-state description of every round. She may also know 
exactly the misalignment of our devices, in which rounds the detectors in the lab will not 
fire... 

But what is a “perfect” description, and which technologies can the predictor master? 
We assume that the predictor bases her knowledge and technologies on quantum theory (this 
will be relaxed in section 9.3). 

An even more powerful predictor model has been considered in the literature: One 
that may hold the degree of freedom that purifies the state used in the process (the 
purification”). In the jargon, we say that such a predictor has access to quantum side- 
information, while our predictor has only classical (though perfect) side-information. To 
appreciate why quantum side-information may help the predictor, we need to recall that 
randomness is not generated for the sake of it, but as a resource in further tasks. The 
predictor may not be interested the whole string of numbers: She may want to predict 
only those numbers that are used in a specific task, which may take place after some 
parts of the original string have been used or revealed. With classical side-information, 
the predictor can only update her knowledge based on Bayesian reasoning; holding on 
quantum side-information, she may be able to make a better guess. 
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8.1.3 The notion of secret randomness 


Having defined process randomness, with the correlated notion of a predictor, the next 
distinction comes from the use to which the generated string is put. For some tasks, 
e.g., sampling and Monte Carlo optimization, the only property of randomness that is 
required is the lack of regularity. For such tasks, certifying product randomness would 
be sufficient, although certifying also process randomness cannot harm. By contrast, 
in cryptographic tasks the predictor is an adversary (traditionally called Eve). Process 
randomness is then necessary, because it guarantees that the generated string is still 
random for anyone who has not seen it. One then speaks of secret randomness. 

Now, the possibility of secret randomness implies that the adversary should not be 
able to see the generated string. This requires the process to happen in a secure location 
to which the adversary has no access, be it by physical presence, or by hacking, or by the 
leakage of information, or by Trojan horses (Figure 8.1). 

The necessity of postulating a secure location calls for two important remarks. First, 
one may ask: With Eve believed to be firmly and forever outside the single secure location, 
isn’t just anything that happens inside “random for her”? This doubt stems from the most 
typical fallacy in dealing with randomness: Taking for granted that “things happen”, 
i.e., that plenty of events are unpredictable for Eve. Postulating the existence of random 
processes inside the secure location would defeat the purpose of certifying another such 
a process. At times one postulates the existence of an initial secret, a string of fixed length 
that Eve ignores (of course, she knows that it exists and how it will be used). For the 
rest, apart from the outputs of the one process whose randomness we want to certify, all 


Figure 8.1 Secret randomness requires secure locations. Top: The natural setting for randomness 
generation is that of a single secure location. The adversary Eve knows a perfect description of all the 
processes happening inside the location, but cannot tamper with them; the goal is to guarantee that (a, b) 
are random for Eve. Bottom: The setting for quantum key distribution involves two secure locations. In 
the space between the two, Eve can only listen to the communication on the authenticated channel but 
can tamper with everything else. In particular, since she can modify the state, the inputs x and y must 
also be random for her: In this setting, one speaks of randomness (or secret key) expansion. 
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that takes place inside the secure location is supposed to be known to Eve, including the 
possible actions of human actors. 

Second, the security of a location can never be absolutely guaranteed. This seems at 
odds with device-independent certification: Indeed, it has been the object of confusions 
in the literature (Appendix E3). To avoid these confusions, we have to distinguish the 
certification of randomness from that of secret randomness. Process randomness can be 
certified in a device-independent way, based on nonlocality, and we are going to devote this 
chapter to it. But in order to use the generated string in cryptographic tasks (i.e., in 
order to claim that this randomness is secret) one needs to add the assumptions of secure 
location. Some of these assumptions bear on the devices: Notably, the devices must not 
have been fabricated by the adversary, otherwise she might have implemented a Trojan 
horse.? In this sense, it is fair to say that secret randomness cannot be certified in a 
fully DI way; but because the assumptions of secure location are required in any secrecy 
scenario, it is also fair to say that the DI certification of the process minimizes the total 
number of assumptions. 


8.1.4 The disruptive role of nonlocality 


Let us now go through different RNGs that could be in the secure location and discuss 
if they can be certified to be random for our Eve. 

If the RNG is an algorithm, it should be rightly called pseudo-RNG: Obviously it has 
no process randomness for our Eve. The only possible randomness is an initial secret, 
which could be used as input of the algorithm. Nonetheless, however long the output 
string, the randomness remains only that of the initial secret. 

If the RNG is a physical process that can be allegedly described with a deterministic 
model, there is still some hope for randomness against our Eve. On the one hand, the 
deterministic model may not be exactly accurate; on the other hand, even if it is accurate, 
it may require an enormous computational power. In both cases, an estimate of process 
randomness can only be based on trust of the modeling: In the first case, to describe 
the non-deterministic correction; in the second case, to be convinced of its irreducible 
complexity (even if one is willing to limit Eve’s computational power, which we won’t do 
here, one should be sure that Eve has not found a simpler description). 

RNGs based on physical processes that require a quantum description are called 
quantum random number generators (QRNGs). Here, even in very simple systems there 
is randomness for an adversary using quantum theory. Suppose that the accurate 
description of the process is a qubit in the state |+2) being measured in the eigenbasis of 
ox: Eve, with all her knowing this, can only despair. However, if the QRNG produces 
a local behavior as in this example, the device could contain a classical simulation 
(and Eve would know it). Once again, the estimate of process randomness requires 


3 To paraphrase a pun by Charles Bennett: Even with DI certification, secrecy devices should be DIY 
(do-it-yourself). The level of paranoia is a free parameter: For instance, is it enough to assume that the adversary 
was not involved in the assembling of the devices? Or should one make sure that even the screws have been 
fabricated inside the secure location? 
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trust in the modeling.’ It’s only for QRNGs based on nonlocality that a lower bound 
on process randomness can be obtained without any need for modeling the process. 

Importantly, a Bell test requires inputs. These inputs must certainly be “random for 
the source” because we are assessing nonlocality assuming measurement independence 
(this will be partially relaxed in chapter 11, but some randomness will have to be kept). 
Even though measurement independence cannot be proved in absolute terms, there are 
many reasonable choices for uncorrelated processes, as discussed in subsection 1.5.3. 

For the case of secret randomness, a subtler issue is whether the inputs of the Bell test 
must be random for Eve: 


e If Eve can prepare the state, tamper with it, or hold quantum side-information, 
the inputs must be random for her. One then needs an initial secret in the secure 
location, and the process is useful only if more randomness is produced per round 
than is consumed to choose the inputs. We speak of randomness expansion. 


e As described in Figure 8.1, the natural setting for randomness is that of a single 
secure location. In this setting, Eve cannot tamper with the state, and it is rather 
natural to assume that she does not hold a purification either. It is then possible to 
use inputs that are known to her, and speak of randomness generation. 


Most of the literature deals with randomness expansion, because cryptographers find it 
always safe to give more power to the adversary. But randomness generation is simpler to 
address and its adversarial model is very reasonable: So in this book we stick to it, leaving 
a quick review of the other results for Appendix F2.1. 


8.2 Quantification of Randomness 


8.2.1 Guessing probability: Definition 


We formalize the definition of process randomness for i.i.d. processes, leaving for 
subsection 8.2.3 the discussion on how to remove this assumption. 

In this general introduction to randomness, let us call Claire the player instead of 
our usual Alice and Bob. Let C be the set of outputs c of the process; its cardinality is 
denoted |C|. Over several rounds, Claire observes the probability distribution P(c). The 
adversary Eve however knows a more precise description & of the process in each round, 
so her knowledge of the output is described by P: (c). The best guess for Eve is obviously 
the most probable output given her knowledge: 


c(é) = argmax,P: (c) [i.e., P(c(&)) = max P; (c)]. (8.1) 


4 This does not mean that there is no advantage in choosing QRNGs: For instance, getting convinced that 
one is measuring the polarization of a photon is probably simpler than certifying that a process is irreducibly 
complex. But ultimately, which RNGs are better will depend on practical criteria: Simplicity, stability, cost, 
speed ... 
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Let us denote by q(&) the frequency at which the process £ occurs: The joint probability 
distribution between Claire’s and Eve’s symbols is 


Por ese) =) q(E)Pe (Se,c(8) (8.2) 
E 


Eve’s average guessing probability will then be given by 


Peuess = ) Poele, c) = X 46) max P; (0). (8.3) 
ceC a 


Its maximal value is Pguess = 1 obtained if Pg (c) = ôe, ¢) for all £. Its minimal value 
is Pguess = max, P(c), obtained if the knowledge of does not help Eve’s guessing. The 
guessing probability for a N-symbol string is denoted Pguess (N). It is equal to CPi 
for aiid. process.” 

The guessing probability is the object that we shall set out to compute, or more 
precisely upper bound, based on nonlocality. Next, let us present its operational meaning. 


8.2.2 Guessing probability and randomness extraction 


The direct output of an alleged random process is termed the raw output string. Even if 
it contains some randomness, i.€., Pguess < 1, Eve may have a lot of information about 
it. Besides, Claire does not know what that information is: Eve may have guessed very 
well some rounds (for instance if there exist some é for which c is deterministic); or 
perhaps Eve’s information is the same in each round ... At any rate, Claire would be 
ill advised to use that string for cryptographic purposes. She should rather extract the 
existing randomness into a shorter string such that her symbols are uniformly distributed 
and Eve is uncorrelated, that is Po, (c'se’) © a (e). 

The study of extraction constitutes a large body of theoretical and practical knowl- 
edge that goes far beyond the scope of a book on nonlocality. The only role of 
nonlocality is to provide an estimate of the guessing probability. So we focus on 
the operational meaning of the guessing probability for extraction, leaving aside all 
the matters related to the design of extractors. To this effect, we notice that any 
extraction procedure can at best preserve the amount of randomness, so the length £ 
of the extracted string must be bounded by the requirement that Press (£) > Psuess(N). 


By definition, after the extraction, one should have Piuess (£) = |C |-*. Thus 


5 Technical note: Not to be confused with a mixture of 1.1.d. processes, an object that appears in De Finetti- 
type theorems. In that case, Pguess(N) = Le IE) (Pg guess)™. 

6 The reader in need of a first introduction may appreciate the first section of (Pivoluska and Plesch, 2014). 
We also highlight that only some special extractors can be used if Eve has quantum side-information (De et al., 
2012). 
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a= logic; Pguess(N) = Amin (CIE) (8.4) 


which constitutes a definition of the conditional min-entropy. 

Now, there exist extraction procedures that achieve equality in (8.4) in the asymptotic 
limit N — oo. The basic result goes by the improbable name of leftover-hash lemma, and 
there have been improvements. We have thus the operational interpretation to the min- 
entropy, or the guessing probability, as quantifier of randomness: It is a parameter that 
enters the construction of the extractor, specifying the optimal length of the extracted 
string. 

Importantly though, all these procedures are based on seeded extractors. This means 
that the extraction procedure must be chosen among a set of possible ones using a string 
uncorrelated from the process and also random for Eve. If this independent randomness 
is not available, extraction may be reduced or even impossible: We’ll discuss one example 
in section 11.5. The canonical example of deterministic (i.e., seedless) extraction is von 
Neumann extraction. It works only in the simplest case: The process must be a i.i.d. biased 
coin, P(0) = p and P(1) = 1 — pin each round, with 5 <p < 1. Besides, if one wants to 
extract secret randomness, von Neumann extraction works only if Eve does not have a 
better description of the process. In other words, we start from Pcopr(c, e) = P(c)de, o and 
Psuess = 2 in all rounds. The two-symbol sequences P(cıc2) have probability P(00) = 
£, P(01) = P(10) = p(1 — p) and P(11) = (1—p)?. Claire discards the sequences 00 
and 11, replaces the sequence 01 by 0 and the sequence 10 by 1. The length £ of the 
final string depends on the initial string, on average (£) = Np(1 — p). Notice that this is 
smaller than the min-entropy, which is N(— log, p) for p > 1 — p. Nonetheless, extraction 
has been achieved: For any given £, the final distribution is P’(c’,e’) = A P(e ) for all c’, 
whatever Eve does to her data to obtain e’. 


8.2.3 Randomness in i.i.d. strategies and beyond 


If Claire’s data are compatible with the ii.d. assumption, intuition suggests that a 
description in terms of a i.i.d. strategy should be fine.’ For probability distributions, this 
intuition was put on rigorous mathematical grounds in the famous De Finetti theorem. 
This theorem, that can be applied to conditional processes too, was later generalized 
to states in quantum theory. For these developments I refer the reader to the masterful 
review by Renner (2007). 

However, it took more effort before the suitable tool for DI certification was found.® 
The breakthrough came with the entropy accumulation theorem (Arnon-Friedman et al., 
2019), which controls the convergence of entropic quantities towards values that would 


7 This is particularly clear in a Bayesian approach: If Claire starts with a i.i.d. prior belief, and the updates 
do not challenge it, then there is no reason to change that belief. 

8 The quantum De Finetti-type theorems depend on the knowledge of the Hilbert space dimension: In DI, 
one does not want to bound the dimension, thus the asymptotic convergence cannot be guaranteed. A De 
Finetti-type theorem for behaviors was eventually found, but it brings into play signaling i.i.d. behaviors, so 
once again it is not suitable for DI certification (Arnon-Friedman and Renner, 2015). 
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be obtained with 1.1.d. strategies. This is sufficient for randomness and quantum key 
distribution, that rely indeed on entropies, though it cannot be used for self-testing for 
instance. Therefore, since here, we present only asymptotic results, for the remainder of 
this chapter we can work under the 1.i.d. assumption, knowing that those results will hold 
in general. 


8.2.4 Warm up: Randomness in characterized 
quantum systems 


The quantification of process randomness is not among the standard calculations of 
quantum theory. This is why, before going to the device-independent estimates that are 
proper for nonlocality, it useful to study process randomness for characterized devices. 
Claire starts by characterizing the single-round state p through tomography (deviations 
from an exact i.i.d. description of the state can be taken care of by De Finetti-type 
theorems, since the dimension is known). Knowing the state, she performs suitable 
measurements. We will look at two cases. 

The first case is that of Claire using the same measurement M = {I],,c E€ C} in all 
rounds. Her statistics are obviously given by P(c) = Tr(pI,). Eve’s knowledge in any 
given round is described by a p¢, with p= Ve gz pe. An upper bound P> Peuess is 
obtained if the decomposition is the most favorable for Eve’s guessing. By linearity of 
quantum theory, all the processes that lead to guessing the same output c can be grouped 
in a single sub-normalized state 


Pe = be: qE pE. (8.5) 
E|c(E)=c 


Thus, the desired upper bound on Pguess will be the solution of 


P= maxi», Tr (X. Mep) 
subject to p> 0 foral cel, . (8.6) 


ie Pc =p. 


This is a semi-definite program (SDP) like those we encountered in section 6.3 and it 
can be solved by the same algorithms (recall in particular that the solution is guaranteed 
to be correct within numerical precision). 

Here comes a case study. Claire knows she has a qubit and, has characterized its 
state as being p = d+ noz). She performs the projective measurement of og, that is 
M+ = | 2) âl. Her outputs are unbiased: P(c = +1) = P(c = —1) = $. But Eve can 
do better: The solution of (8.6) yields the upper bound P = 5(1 +v 1 — 77), obtained 
for p+ = 51x ) (x41 with Ix+) = VO +0)/2|+4) + V =7)/2|—2) (more details in 
Appendix G.5, where we also show an example of the advantage of keeping quantum 
side-information). 
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As a second case study, we look at what happens if Claire alternates among two or more 
measurements. Our Eve knows which measurement is being performed in each round. 
However, measurement independence requires her description of the process to be 
uncorrelated from the measurement that is being performed; and no quantum description 
can determine all the outputs of a family of incompatible measurements. Thus, in addition 
to possible randomness in the state, there is an additional randomness coming from the 
measurements. A striking example is the measurement of a qubit in the maximally mixed 
state p = 51 along two or three mutually unbiased bases: Using a simple generalization of 


is egs o F 1 1 rn 1 1 
(8.6), the best guessing probabilities are found to be P= 5(1 + m) and P=5(1+ z) 


respectively? (Law et al., 2014). 

As a last remark: In these two examples, we have given both the state and the 
measurements. It may seem more natural to give the state and look for the optimal 
measurements. This optimization is no longer an SDP, so numerical methods can provide 
only heuristic bounds. In some simple cases, the exact solutions are known analytically. 
For instance, the choice of measurements in the two examples just mentioned are 
provably optimal; whereas the optimal solution is not known if Claire can alternate among 
four or more measurements on a qubit. 


8.3 Device-Independent Certification of Randomness 


We are now in a position to tackle the central section of this chapter: The DI certification 
of the amount of randomness. In Appendix F2.1 the reader can find more information 
about the origin and development of the field, as well as further references. 


8.3.1 Generic formulation 


In a Bell scenario, Claire is replaced by Alice and Bob (and others if the scenario is 
multipartite, which we won’t consider here). There should be some randomness for every 
pair of inputs and it may be advantageous to extract it all (Bancal et al., 2014); but for 
simplicity, here we discuss only the randomness of the outputs c = (ao, bo) for the pair 
of inputs x = y= 0. 

Alice and Bob see the behavior P. Eve knows the actual behavior in each round, one 
of the Pg with P = } ` gePe. Since convex combinations of behaviors are still behaviors, 
in analogy with what we did in subsection 8.2.4, we can group together all the processes 
that lead to guessing the same output in a sub-normalized behavior: 


Piad= JO Pe (8.7) 
Elc(E)=(a,b) 


9 By rewriting the result for two bases in terms of min-entropies, one recovers a state-independent entropic 
uncertainty relation derived in the pioneering work of Maassen and Uffink (1988). 
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Besides, by definition maxa 4) P(a,p) (@’,6’|0, 0) = P(a,s)(a,6|0,0). Thus, the desired 
upper bound on Pguess will be the solution of 


P= max(Pa n} D(a,b) Plas) (a, 61050) 
subject to ey P(ab) =P : (8.8) 
Piab) E Q for all a € A, be B 


The last line forces Eve’s descriptions to be quantum behaviors, possibly sub-normalized; 
we have absorbed into it the positivity condition P(q, ) (a, b|x, y) = 0 for all a, b, x, y. 
Without a constraint on the acceptable set of behaviors, the trivial upper bound P = 1 
is always achievable (if P is nonlocal, some P(q,p) will have to be signaling). As we have 
seen in section 6.3, the membership in the quantum set is not easy to handle but it can be 
approximated by semi-definite criteria. Besides, if Q is replaced by a QF D Q as defined 
there, the optimization will lead to P7 > P, so the bound remains a valid upper bound. 


8.3.2 Case study: Randomness in Alice’s output from CHSH 


As a case study, we consider a CHSH test that has led to an observed value S = Sops > 2. 
If Sobs > 2, there is randomness both in Alice’s and in Bob’s outputs. We are going to 
find an upper bound for the probability of guessing one of Alice’s outputs, say ag. The 
optimization to be performed is 


P=maxp,,p_P (a= +1ļ|x = 0) + P- (a = —1|x = 0) 
subject to S[P4+ +P-] = Sobs, ; (8.9) 
P+, P-EQ. 


Notice that the only constraint related to observation is the value of S: We are not 
requesting P} + P— to be equal to the observed behavior. 
This optimization can be cast as an optimization over a single behavior: 


P= maxp P'(a= +1|x = 0) 
subject to S[P’] = Sobss : (8.10) 
Ped 


Indeed, define P’_(a, b|x,y) = P_(—a, —6|x,y). Clearly, P € Q if P_ € Q; besides, since 
flipping both outputs does not change the correlations and CHSH depends only on 
correlations, we have S[P_] = S[P-]. The form (8.10) is then obtained by letting 
P' = P} + P.. We stress that the single-behavior optimization was not written a priori: 
It was derived from the rigorous optimization (8.9) using the symmetry of CHSH 
(Exercise 8.1). 

Now we proceed to solve (8.10) analytically, following (Pironio et al., 2010). Recall 
that we can work with projective measurements and pure states without loss of generality, 
since the dimension of the Hilbert spaces are not fixed and the purification has not leaked 
out from the secure location. Now we can use Yordan’s lemma, which we have already 
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mentioned in the previous chapter and is proved in Appendix G.4: If Ap and A; are 
Hermitian operators on an arbitrary Hilbert space H with eigenvalues —1 and +1, there 
exist a basis in which both operators are block-diagonal 


A: = QAL = Daa 0” (8.11) 
a a 
with blocks of size at most 2 x 2 (the subspaces in which both operators commute will 
be characterized by ĉo, = £41,,). Of course, the same holds for Bob: 
By = B} = Pip. (8.12) 
B B 


The unknown state can then be written |Y) = ee B Pap |yeP } where | oP } is anormal- 
ized two-qubit state and } 4, gPap = 1. Also, 


Sobs =X pap (W |S*| y-°?). (8.13) 
a, p 


with S% = Z (-1)°A @ By”. Finally, 


P'(a=0|x=0) =Y pasl yh 08" 1f |"). (8.14) 
ap 


The optimization (8.10) is now cast as the maximization of a convex sum P = )°, pkPk 
under a convex constraint S = )°, ppSp = Soo. Lets suppose that we find the solution 
for the elementary problem maxs,=; Pp = f(s). If in addition f” (s) does not change sign, 
we have two possibilities: 


(i) Iff is convex, the solution to the main problem is simply P = f (Sobs) obtained by 
setting all the Sk = Sobs- 


Gi) Iff is concave, the solution to the main problem is given by the convex combi- 
nation of boundary points. For our problem, the boundary S = 2/2 has P = a 
the boundary S = 2 has P = 1; so, if f were concave, with Sbs = p2V2 + (1 — p)2 
the solution would be P = p4 + (1 —p) = 1 — $ (Soss — 2)/(2V2 — 2). 


So, our next step is to find the maximal value of (y [T9 Q 1| Y) over the set of pure two- 
qubit states for a given value S = s. From (3.17), we know that the pure state |Y (0)) = 
cos@|00) + sin@|11) can reach s = 2y 1 + sin? 20. By writing down the projector for 
that state, we see that its most biased marginal is P(+|2) = $(1 +cos26). States with 
smaller values of 9 can also reach the same s (for suboptimal measurements), but their 
most biased marginal is definitely lower; states with higher values of 6 cannot reach s. 


110 Certifying Randomness 


Therefore, by solving explicitly for 0, we find, after trivial algebra, f(s) = za + 


y2- (s/2)2). This function being convex, we have found the solution to the general 
problem: 


= 1 Sobs 
P(Sobs) = 5 | 1+4[2-( 5 (8.15) 


‘Two remarks can be made at this stage: 


e There is an interesting connection with self-testing. Let us take the behavior that 
gives the highest violation of CHSH for a non-maximally entangled two-qubit state 
|W(@)), having chosen the measurements such that Ag = og. In this case we would 
have P(a = 0|x = 0) = P(S,ps). But the r.h.s. is an upper bound on Eve’s guessing 
probability, while the 1.h.s. is part of the behavior observed by Alice. Thus, Eve’s 
knowledge does not allow her to guess better than Alice. For the power given to Eve, 
we expect this to be the case only when she knows that the state is pure. In fact, one 
can prove that the observation of both S[P] = Sops and P(a = 0|x = 0) = P(Sops) 
self-tests |Y (8)) (Exercise 8.2). 


e Having the guessing probability, one can compute the min-entropy (8.4), which 
is a strict lower bound on the amount of randomness. But in practice, a perfect 
parameter estimate (e.g., of Sops) cannot be obtained from finite data; and the 
absolute worst-case assessment is always that there is no randomness, since any 
finite string may be the outcome of a local process, however unlikely. This is why 
one rather opts for deriving rigorous bounds for the typical min-entropy (Pironio 
and Massar, 2013; Arnon-Friedman et al., 2018). For instance, for the randomness 
of ap as a function of the violation of CHSH one would get in the asymptotic limit 


se Oe Gn 
ryp =1-h 5 +5 (=) 1 (8.16) 


where h(p) = —plog, p — (1 — p) log, (1 — p) is the binary Shannon entropy. Notice 
that yp > —log2 P(Sops) computed from (8.15): Even in the asymptotic limit, 
a typicality result is more optimistic than a strict one. We won’t discuss these 
matters further here: They are very important for information processing and actual 
experiments [see e.g., (Shen et al., 2018)], but they don’t shed further light on the 
role of nonlocality. 


8.3.3 Optimizing the amount of randomness 


Among the many other results that have been obtained in DI randomness generation, 
we focus on the discussions about extracting the maximal amount of randomness. The 
pioneering paper is (Acin et al., 2012), but we describe later results that encompass and 
clarify those ones. 
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We have just seen that the maximal violation of CHSH guarantees that one of Alice’s 
outputs is maximally random: It’s a bit with P(0) = P(1) = 5. The first natural extension 
is to extract randomness from one output of Alice and one of Bob. These are two bits: For 


maximal randomness, P (az, b5) = 1 must hold for two suitable inputs x,y. In the behavior 


(3.11) that maximizes CHSH, however, all the probabilities are 1a $ 3). In order to 


extract two bits of randomness, one can add a third input y = 2 to Bob with the property 
(4oB2) = 0 and (41B2) = 1. If S = 2/2 with x, y € {0, 1}, we know that (up to local 
isometries) the state is |+), Ag is ox and A, is og. Given this, (41 B2) = 1 implies that 
Bz is oz on Bob; therefore Ag Bz acts as oz ® og on |t), whence P(ao,b2) = i as desired 
(Mironowicz and Pawłowski, 2013; Law et al., 2014). 

Next, having a maximally entangled state and/or the maximal violation of a tight Bell 
inequality is not the only way to get maximal randomness. Specifically, one can devise 
a Bell test that extracts two bits of randomness for any non-maximally entangled state of 
two qubits. From the state behavior (3.29), it is clear that this can be achieved only if Az 
and B; self-test two orthogonal directions in the (X,) plane of the Bloch sphere. Having 
noticed this, the solution follows the same pattern as before: One self-tests |W(@)) as well 
as the two suitable additional measurements. 

But self-testing inspires further extensions. For the sake of the example, let us go 
back again to the self-testing of |o*): A qubit is being measured on Alice’s side, and an 
extremal POVM on qubits has four outcomes. If one could self-test that an additional 
measurement Az is a suitable such POVM, we would have P(az) = i i.e., we could 
extract two bits of randomness from Alice’s certified qubit alone. Two explicit solutions 
have been given in (Acín et al., 2016). 

Shortly later it was proved that this is not the limit yet: One can in principle extract an 
unbounded amount of randomness from Alice’s certified qubit in a device-independent 
way (Curchod et al., 2017). The protocol builds on a remarkable observation (Silva 
et al., 2015): If Alice’s operation consists of a sequence of suitable weak measurements 
labeled by j, each of the behaviors P(a;, 6|x;, y) can violate CHSH. Such a sequence of 
measurements cannot be described as a single POVM if the choice of a later measurement 
may depend on the output of the previous ones; thus, this protocol is not bound to yield 
at most 2 bits of randomness.!° 


8.4 Device-Independent Quantum Key Distribution 


8.4.1 Introduction to QKD 


As mentioned at the beginning of Part II and in detail in Appendix E it is by studying 
quantum key distribution (QKD) that the idea of device-independent certification came 


10 For characterized devices, the possibility of extracting unbounded randomness from a qubit is trivial even 
with a fixed sequence of strong measurements: One can first measure oz, then og, then again oz, then a... 
After the first measurement, the information about the initial state has been erased, and thus not more than 
one bit of randomness is extracted from the original state, compatible with the fact that the measurements are 
projective. The subsequent randomness comes from the incompatibility of measurements themselves, not from 
the initial state. 
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to the fore. It is fair to finish this part with some comments about this task. For readers 
who want to know more: The basic notions of QKD can be gathered from the review 
(Scarani et al., 2009); to study DI security of QKD, it is good to start with the first such 
proof, that assumed an i.i.d. process (Acin et al., 2007; Pironio et al., 2009), then go 
straight to the application of the entropy accumulation theorem that vindicated the same 
asymptotic bound without that assumption (Arnon-Friedman et al., 2019, 2018). 

The goal of QKD is the generation of a secret key by two distant players, Alice and 
Bob. A secret key is a string of numbers that must be identical for Alice and Bob and are 
random for Eve. In a nutshell, the procedure consists of three steps: 


1. By exchanging quantum signals, Alice and Bob generate their own raw strings of 
bits (raw Reys). 

2. By classical communication, they run an error correction protocol, at the end of 
which Alice’s and Bob’s strings are identical. 


3. To remove Eve’s information, they apply the same randomness extraction protocol 
on their strings. This step is called privacy amplification. 


Thus, like in the case of randomness, the role of quantum theory (and of nonlocality in 
the DI framework) is to provide an estimate of min-entropy. 

The task of QKD thus requires the existence two separated secure locations, connected 
by two channels (Figure 8.1, bottom). On the channel used to send quantum signals 
(quantum channel) Eve can tamper as much as she wants. The classical channel should 
be “authenticated”: This means that Alice and Bob know that they are talking to one 
another, and Eve can only listen but not tamper Gf Eve could impersonate Bob, Alice 
would end up establishing the secret key with her and the task would be impossible). 


8.4.2 Differences between QKD and randomness 


The setting of two separated secure locations, connected by the two channels just 
described, makes QKD more demanding than randomness generation. We finish this 
chapter by going through the main differences. 

First, for the authentication of the classical channel, Alice and Bob need an initial 
shared secret. This is obvious without any knowledge of the technicalities of authentica- 
tion: If I want to make sure that the person I am talking to is the right one, I must be able 
to request some identity token, previously agreed upon. Thus, there 1s no key generation: 
Only key expansion. 

Second, on the quantum channel, Eve can tamper with the state. If she knew in 
advance which measurements are being performed in each round, she could realize a 
man-in-the-middle attack (a.k.a. intercept-resend): She would perform the measure- 
ments herself, learn the output and resend a signal to Alice and/or Bob prepared in the 
corresponding eigenstate. To avoid this, in QKD the inputs of the process must be random 
Jor Eve. Therefore, Alice and Bob must have some at least some initial randomness in 
their locations. Usually, it is assumed that there exist local random processes inside each 
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location: For QKD, this does not defeat the purpose, since the desired product is not 
local randomness but a joint secret key. 

For the same reason that she has access to the system, a technologically unbounded 
Eve can naturally hold quantum side-information. Bounded-storage models have been 
studied, that restrict Eve’s power of holding to the purification, but usually they are not 
invoked in actual security proofs. 

The most important difference is the issue of spatial distance. In randomness 
generation, the distance between Alice and Bob can be kept at the minimum (e.g., 
dictated by the locality loophole), which is just as well, since it all must fit in the same 
secure location. Conversely, in the case of QKD, the larger the distance between Alice 
and Bob, the more relevant the task. However, longer distances imply greater losses on 
the quantum channel; and if the losses are too high, the detection loophole cannot be 
closed. Even assuming that the detection loophole is closed, losses introduce errors in 
the raw strings of Alice and Bob, and error correction will consume part of the min- 
entropy. In other words, a loophole-free Bell test can be almost immediately read as DI 
randomness generation, but not as DI certification of QKD. For the latter, a direct link 
solution is probably not feasible and is certainly limited (see Exercise 8.3 for a rough 
estimate). Other solutions are being considered: For the state-of-the-art at the moment 
of writing, we refer to (Mattar et al., 2018). 


EXERCISES 


Exercise 8.1 Generalizing subsection 8.3.2, we want now to bound the randomness associated 
to the pair (ao, bo) of outputs of CHSH, knowing only the observed violation Sops. 


(a) Write down the optimization, analog of (8.9). 
(6) Prove that the symmetries of CHSH allow the reduction to a two-behavior optimization, 
but not to a single-behavior optimization like (8.10). 
Remark: For CHSH, it was checked that the result of the two-behavior optimization is the 
same as that one of a single-behavior optimization. This observation points to a property of the 
solution that may provide some insight; but cannot be used to justify the use of a single-behavior 
optimization a priori. 


Exercise 8.2. At the end of subsection 8.3.2, we made the remark that the joint observation 
of Sobs given by (3.18) for CHSH, together with P(a = 0|x = 0) = P(Syps) given in (8.15), 
is a self-testing of the state |Y (0)) = cos@|00) + sin0 |11). Prove that this is the case: 
(a) By writing an ab initio proof along the same lines as the one in subsection 7.1.1. 
(6) By proving that those observations match the self-testing criterion for that state given 
in (Bamps and Pironio, 2015): The maximal violation of the tilted CHSH inequality 


æ (Ao) + (AoBo) + (AoB1) + (A1 Bo) — (A1B1) = V8 + 20? (8.17) 


Jor a particular a = œa (0) that you can deduce here. 
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Exercise 8.3. The fraction of extractable secret key in a QKD protocol is estimated by the 
difference between the Shannon entropies of Eve and that of Bob on Alice’s string [for precise 
statements and justification, refer to the review (Scarani et al., 2009)]. 

We consider a QKD protocol implemented with a perfect maximally entangled state | ot} 
of two photonic qubits. Alice and Bob extract the key by measuring oz. Eve’s information 1s 
estimated in a DI way by checking the violation of CHSH with the optimal measurements 
for |t). 

We assume that both players have detectors with perfect efficiency, and that the source is in 
Alice’s lab, so that she detects every photon. The photons traveling to Bob arrive with probability 
n; in the rounds when he does not detect, he chooses a random value for his bit. 


1. Prove that the predicted value of CHSH is S = 2\/2n. 

2. Prove that the error fraction in the raw key is € = P(az # bz) = ia — 7). 

3. The fraction of extractable secret key is therefore r = h(P(S)) — h(€) where P(S) is given 
by (8.15) and h is the binary Shannon entropy h(p) = —plogs p — (1 — p) log(1 — p). 
Compute numerically the value of n such that r = 0. 

4. An optimistic estimate for optical fibers is n = 10~4°° where d is the distance measured 
in kilometers. Compute what would be the critical distance for a fiber implementation of 
this protocol. 


Part Ill 


Foundational Insights from 
Nonlocality 


Nonlocality has always had a strong “foundational” flavor, manifested mainly in discus- 
sions about interpretations, but also in more technical works. The latter are the object 
of this last part. Being a phenomenon independent of quantum theory, nonlocality can 
be described in other mathematical frameworks. One can then discuss which features of 
nonlocality are robust to a possible change of theory, and which are so dependent on 
quantum theory that could be taken as part of its definition. 


9 


Nonlocality in the No-Signaling 
Framework 


Cosi tra questa mmensita s'annega il pensier mio: e il naufragar mé dolce in questo 
mare. 


Amidst this immensity my thought drowns: And to flounder in this sea is sweet to me. 
G. Leopardi, Linfinito 


In this chapter and the next, we extend the study of Bell nonlocality beyond quantum 
behaviors to consider the broader set of no-signaling behaviors. After introducing the 
formalism, we show that some features of quantum theory are actually shared by any 
theory predicting no-signaling behaviors. 


9.1 The No-Signaling Polytope 


No-signaling (NS) behaviors have been already defined and studied in section 2.2, to 
which we refer. The set of NS behaviors forms a polytope, naturally called the no-signaling 
polytope. Indeed, the convex sum of two NS behaviors is another NS behavior, so the set 
is convex. Besides, once the NS constraints (2.9) are enforced, the possible behaviors are 
only limited by the positivity conditions: Thus, the set of NS behaviors is delimited by 
finitely many hyperplanes. 

Because of this, contrary to what happens with the local polytope, the equations of the 
facets are easily listed, while some work is required to determine the extremal points. The 
local deterministic behaviors remain extremal points also for the no-signaling polytope. 
In addition to those, the polytope has nonlocal extremal behaviors. Clearly, these extremal 
behaviors are not deterministic, since the only deterministic NS behaviors are the local 
deterministic ones (recall Exercise 2.2). The highest violation of an inequality J < I, 
by the extremal points of the no-signaling polytope is called the no-signaling limit I Ns— 
recall that this may not coincide with the algebraic limit, as shown by the counterexample 
of the GYNI inequality in subsection 5.1.3. 

The simplest example of nonlocal extremal NS behavior is called PR-behavior from 
Popescu and Rohrlich, who found it in previous works (Rastall, 1985; Khalfin and 
Tsirelson, 1985) and used it in a very influential work (described in chapter 10). We 
shall study this behavior in the next section. As it turns out, there is no other extremal 
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Figure 9.1 The no-signaling polytope in the (2, 2; 2, 2) scenario. The drawing represents exactly 
the two-dimensional slice of the 8-dimensional object that contains four nonlocal extremal points 
(PR-behaviors). The curved line of the quantum set represents the TLM boundary (6.5), the black 
dot represents the behavior (3.11) that gives S = 2/2 for the form (2.29) of CHSH. On the side, 
a pictorial representation of the hypothetical PR-box in the same style as Figure 1.1. 


NS behavior above each CHSH facet; in other words, the no-signaling polytope of the 
CHSH scenario has only eight extremal nonlocal points, that are all equivalent to the PR- 
behavior up to the usual relabelings. The most frequent representation of this polytope 
is given in Figure 9.1; for plots of other slices with more intricate geometric features, see 
(Goh et al., 2018). 

Beyond the CHSH scenario, the extremal NS behaviors have been completely listed 
for the (2, m; 2, m) scenarios (Barrett et al., 20056) and the (M, 2; M, 2) scenarios 
(Barrett and Pironio, 2005; Jones and Masanes, 2005). In both cases, they are rather 
straightforward generalizations of the PR-behavior. Once again, this pleasant intuition 
fails when moving to multipartite scenarios. In the (2, 2; 2, 2; 2, 2) scenario (Pironio 
et al., 2011), there are 45 inequivalent nonlocal extremal NS behaviors. We know 
from subsection 5.1.3 that there are also 45 inequivalent non-trivial facets of the local 
polytope. The equality in number is not accidental (Fritz, 2012), but does not lead to a 
simple geometry, on the contrary: Above any local facet one finds several inequivalent 
extremal NS points; and some of the extremal NS points do not achieve Iys for any 
local facet. 
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9.2 The PR-Box 


9.2.1 The PR-behavior and the hypothetical PR-box 


We work in the CHSH Bell scenario. Rather than proceeding formally to find the 
intersection of the no-signaling facets, we introduce the PR-behavior in a heuristic way 
by asking what a behavior must satisfy in order to reach S = 4. The answer is clear: The 
correlation coefficients must satisfy Eoo = Eo, = E10 = —E11 = 1, that is! 


a@®b=xy for a,be {0,1} (9.1) 


where a @ b = (a + b)mod2. There are sixteen deterministic behaviors, obviously 
signaling, that satisfy (9.1): 


Py (a; bix, y) = ba=yySb=yyeaxy With y = (yoo, Yoi» Y109 Y11) € {05 1%. (9.2) 


Letus prove that only one convex combination of these behaviors is no-signaling. Indeed, 
for P = ay q(y)Py the no-signaling conditions (2.9) read 


P(a|0, 0) = P(a|0,1): X- g(v)8a=yo0 = > 17) Sa=y01> 
y Y 

Piai =P A y Biy = Y O Bas 
Y 4 


P(b0, 0) = PO|1,0) : XC a07 )ôb=yo = X 40 )Sb=y10> 
y y 


P(6|051) = POMS 1) : X40) Soyo = D4) 85=y101- 
Y K 


Now, take a = b = &: Enchaining the conditions implies that all the sums must be equal, 
and the requirement )°,, q(y)dg=y1) = Dy q(y)6z=),,;@1 fixes their values to be F; Thus, 
in order to satisfy the no-signaling conditions, a convex combination of the P, must have 
unbiased marginals. This forces the unique solution? 


1 1 
Ppr : P(a, b|0, 0) = P(a, b|0, 1) = Pia, b|1, 0) = 3 oa 0,b=0 + 50a 1,b=1> 


1 1 
P(a, 6|1, 1) = 50a 0, 5: 1+ 590 1,b=0- (9.3) 


1 The choice a, b € {0, 1} is almost universal in the literature, but of course is not constraining. If someone 
prefers to work with a, b € {+1, —1}, the analog of (9.1) is ab = (—1)*””. 

2 Indeed, for xy = 0, all P, are such that (a, b) is either (0, 0) or (1, 1): The requirement that Alice’s (Bob’s) 
output must be unbiased forces to take both possibilities with the same weight. Similarly for xy = 1, where 
(a, b) is either (0, 1) or (1, 0). 
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This uniqueness, together with the fact that S = 4 is the algebraic maximum of CHSH, 
implies that Ppr cannot be realized by mixing other no-signaling behaviors: It is an 
extremal point of the no-signaling polytope. 

The PR-behavior per se is nothing more than the answer to a mathematical question. 
However, in the context of quantum information theory, a change of perspective took 
place, and several works focused on the properties of a hypothetical resource that would 
exhibit such a behavior. This resource came to be called PR-box from the way it was 
usually drawn (Figure 9.1). Several results quickly followed (Barrett et al., 20050; Barrett 
and Pironio, 2005; Jones and Masanes, 2005; Masanes et al., 2006); it is beyond the scope 
of this book to review them in detail. Here we follow one story line: The conjecture that 
the PR-box could be the “unit of nonlocality”; the quest for the quantum principle that 
was the motivation of Popescu and Rohrlich (1994) is the subject of for chapter 10. 


9.2.2 Feats and failures of the PR-box as unit of nonlocality 


In entanglement theory (see Appendix C.2.3), the amount of entanglement is measured 
by how many singlets are needed to create a state (entanglement of formation), can 
be extracted from a given state (distillable entanglement), and so on. This is not 
a mere arbitrary choice. The resources are equivalent insofar as the transformation 
between singlets and any state is reversible in the limit of infinitely many copies. Also, 
with sufficiently many shared singlets, even multipartite states can be prepared with 
local operations and classical communication, through teleportation and entanglement 
swapping. The singlet is indeed the unit resource for entanglement. 

Could the PR-box play a similar role of unit of nonlocality? It does share several 
analogies with the singlet. First and obviously, it is maximally nonlocal in the simplest 
scenario (2, 2; 2, 2), just as the singlet is maximally entangled for the simplest Hilbert 
space C? Q C?. Then it is monogamous or, seen as dynamics rather than kinematics, a no- 
cloning theorem holds (Exercise 9.1). Possibly the most striking connection between the 
PR-box and the singlet was provided in (Cerf et al., 2005). Consider the singlet behavior, 
i.e., the state-behavior that describes the statistics of all possible projective measurements 
on the singlet:? 


As 1 X X 
Py-= {Pt blâ, b) = 14 —abâ. b) la, b € {—1, +1},â, b € S? l (9.4) 


Cerf and coworkers proved that this behavior can be simulated exactly if in each round 
the players share a PR-box alongside with LVs. The explicit protocol is presented in 
Appendix G.6 following the approach of (Degorre et al., 2005). This result is remarkable 
because the only nonlocal resource needed to reproduce the singlet behavior is a no- 
signaling process with binary input and binary outputs; the complexity of infinitely many 


3 It is of course the Werner behavior (3.24) for W = 1, and is related by local changes of basis to the state- 
behavior (3.29) for 6 = F- 
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settings on a sphere is taken care of by the local variables.* One is even tempted to go 
beyond simulations and conjecture that this may be what a singlet “really” is: A PR- 
box complemented with local variables. It was later proved that, in order to simulate 
weakly-entangled two-qubit pure states, the PR-box must be queried more than once in 
each round (Brunner et al., 2005). This may not be an unwelcome feature for a unit of 
nonlocality, given that nonlocality and entanglement do not always behave monotonically 
(e.g., in the violation of some inequalities, section 4.3; the threshold efficiency for the 
detection loophole, Appendix B.3). That being said, no link was found between this 
result on simulations and those other observations. 

Finally, for bipartite scenarios the PR-box lives up to the role of unit resource (Barrett 
and Pironio, 2005; Jones and Masanes, 2005): Given sufficiently many PR-boxes shared 
between two players, all the extremal NS behaviors of those scenarios (that is the whole 
no-signaling polytope) can be generated by local operations. 

Unfortunately, the PR-box fails to play the same role when moving to multipartite 
Bell scenarios. Barrett and Pironio (2005) proved that some multipartite no-signaling 
behaviors can’t be simulated if the players initially share only bipartite PR-boxes, even 
in arbitrarily large numbers. The core of their proof is instructive and can be sketched 
here. Consider the behavior in the (2, 2; 2, 2; 2, 2) scenario defined by the five perfect 
correlations 


ag @b; =0 

bp Bc, = 0 

co B ay = O* (9.5) 
ao B bo Beco = O 
a, 0b, 6c; = 1 


all the other expectation values being maximally mixed. This behavior is a nonlocal? 
extremal point of the no-signaling polytope. Because it’s nonlocal, the players must rely 
on some nonlocal resource to simulate it. Our assumption is that they share only bipartite 
PR-boxes. Let’s put ourselves in Alice’s shoes, and suppose that in a given round she 
received the input x = 0. If Bob has received y = 1, Alice should get correlated only to him, 
and so she should not use the outputs of the PR-boxes she shares with Charlie. If Bob 
has received y = 0, Alice should be correlated with Bob and Charlie together: In this case, 
she cannot ignore the outputs of the PR-boxes she shares with Charlie. By symmetry, all 
players are in the same predicament: Thus, the behavior (9.5) can’t be simulated with 
those shared resources. While this behavior can’t be realized with quantum resources, a 
five-partite analog can, and the same kind of proof applies. So, with only shared bipartite 
PR-boxes one can’t even reconstruct all the possible multipartite quantum behaviors. 


4 This work was inspired by the model of Toner and Bacon (2003), that achieves the same with the simplest 
signaling resource, namely one bit of communication (see section 11.2). 

5 By now, the reader should have noticed that the nonlocality is manifest: The first four correlatons would 
imply ay ® b1 @ cy = 0 for LHV. 
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This observation suggests that it is impossible to define the analog of teleportation 
and entanglement swapping, because these are the protocols through which shared 
singlets can be used to generate any multipartite state. Indeed, the analog of a Bell state 
measurement for PR-boxes is problematic: It cannot be consistently defined over the 
whole no-signaling polytope (Skrzypczyk et al., 2009). In general, PR-boxes have poor 
dynamics: The set of transformations that can be applied to these hypothetical resources 
without leaving the framework is very limited (Barrett, 2007; Dall’Arno et al., 2017). 


9.2.3 An “implausible consequence” of having a PR-box 


PR-boxes may fall short of all the desiderata for a unit of nonlocality, but would 
nonetheless be very powerful resources. The first® to highlight their power was Wim 
van Dam, in the context of communication complexity (van Dam, 2013). Consider the 
following game: The verifier draws two strings of n bits x,y € {0, 1}” uniformly at random, 
sends x to Alice and y to Bob. Contrary to the rules of a Bell test, here the players can 
exchange m bits of communication. The goal is for Alice to output the value of a function 
f (X59). 

Any such game can be won if m = n, because Bob can send y to Alice. For some 
choices of f, a much smaller amount of communication is sufficient: For instance the 
parity sum f(x,y) = 7-1 (xe Dyk) = X Q Y with X = @}-1 xz and Y = Qy_1 yg. Bob 
can compute locally the parity Y of his own string and send this single bit to Alice, who 
can then compute f But it is not surprising and can be rigorously proved that, given a 
suitable choice of f, the game can be won perfectly only if the players can exchange n bits. 
One such choice is the inner product f(x,y) = x- y = Qp_1 XkYk- 

However, if the players share a PR-box, one single bit of communication would be 
sufficient to win the game perfectly. Indeed, if Alice and Bob input sequentially all the 
(xk, VR) into the PR-box, due to (9.1) the corresponding outputs (ag, bz) will satisfy 
z1 Xk Vk = A; (az ® bp). In other words, the PR-box has transformed products of 
bits into sums. At this point, to win the game Bob needs to send the single bit B = @Qy_, bz 
to Alice. 

Now, any Boolean function can be written as a sum of products. Thus, the PR- 
box trivializes communication complexity, which van Dam deemed “implausible conse- 
quence,” all the more because shared entanglement would not lead to any improvement.’ 
This work paved the way for a series of studies with a much more ambitious scope: 
Finding implausible consequences not just for the PR-box, but for every hypothetical 
no-signaling resource that produces behaviors outside the quantum set. This will be the 
topic of chapter 10. 


6 Van Dam noticed this point in his Ph.D. thesis, defended in Oxford in the year 2000, and posted it on 
the arXiv in 2005, but was discouraged to submit it for publication by more senior scientists. The argument is 
almost trivial indeed, but it became very influential, as we shall see in chapter 10. It was eventually published in 
2013, as cited in the text. Nobody was harmed: The result was accessible, priority has always been recognized, 
and Wim holds a permanent position. 

7 Entanglement does help in some communication complexity tasks, even providing an exponential 
reduction of the required communication; but not for the inner product function (Buhrman et al., 2010). 
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9.3 Device-Independent Certification Reloaded 


Device-independent certification can now be revisited by relaxing the requirement that 
the adversary’s knowledge and technical skills are based on quantum theory (subsection 
8.1.2). Now, our adversary masters a theory of no-signaling resources, in which all no- 
signaling behaviors can be produced. 


9.3.1 Randomness generation 


For randomness generation, the adversary can describe the process with her theory of 
no-signaling resources, and can run a simulation based on such resources—for short, we 
have an adversary with no-signaling description. 

Under the i.i.d. assumption, the study of randomness is simpler than in the quantum 
case. Since the NS set is a polytope, the constraint that behaviors must be no-signaling is a 
linear constraint, so the analog of the optimization (8.8) becomes a linear program. Even 
more, that constraint can be implemented automatically by decomposing the behavior 
on the set of extremal behaviors, which are finitely many and known—there is still an 
optimization to be performed, since the number of extremal behaviors is larger than Dys, 
thus a generic behavior may admit several decompositions. The result of the optimization 
is a tight upper bound? on Peuess for our Eve. 

In the (2, 2; 2, 2) scenario, the study of randomness against an adversary with no- 
signaling description is almost trivial due to the geometry of the NS set. Recall that 
there are only eight extremal nonlocal behaviors, all versions of the PR-behavior (9.3). 
A decomposition of the observed behavior into extremal one determines the fraction gpr 
of rounds in which the nonlocal behaviors are used. In those rounds, there is randomness 
for Eve, whereas she knows everything when deterministic behaviors are used. The 
optimization thus consists in minimising gpr. Given Sop, > 2 for one version of CHSH, 
only the PR-behavior that lies on top of that same local facet contributes to the violation; 
as for the others, six give S = 0 and one S = —4. Obviously qpr is minimized by 
decompositions that use only that PR-behavior: The minimal value is determined by? 


Sobs —2 


5 (9.6) 


Sobs = qPR4 + (1 — qpR)2, i.e., GPR = 


8 It is still an upper bound, insofar as it is computed from the decomposition that is the most favorable for 
Eve. Eve may know that the actual decomposition is another one and not be able to guess so well. 

9 Contrary to the quantum case, here all that matters is the value of Sobs: The detailed knowledge of the 
behavior won’t give any improvement, since every decomposition of the local part into deterministic behaviors 
is equivalent for Eve’s guessing. In other words, randomness is uniquely determined by the “local fraction” 
(subsection 9.4.1). That being said, this is an accident of the (2, 2; 2, 2) scenario that does not extend to 
scenarios with more nonlocal extremal behaviors. 
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Let us then first consider the randomness of Alice’s output for one of the inputs. For a 
PR-behavior, Eve’s guessing probability is 5. Thus P = gPRS + (1 — qpr) which is 


p1- Sh? 
4 


(9.7) 
We observe that Pguess < 1 for all Sobs > 2: As soon as there is some nonlocality, there is 
randomness in a no-signaling theory. On the other hand, at the Tsirelson bound Sobs = 
24/2 we do not have Psuess = 5 as in the quantum case (8.15). This is expected: The 
same data certify less randomness for an adversary with more power. Turning now to the 
randomness of the joint outputs of Alice and Bob, (ag, bo), we notice that Eve’s guessing 
probability for a PR-behavior is also 5 because of the perfect correlation. Therefore, 
the bound on Eve’s guessing probability is again (9.7). This is significantly different 
from the quantum case, where we have seen that there is more randomness in the joint 
outputs. In fact, for any Bell scenario, contrary to the quantum case (subsection 8.3.3) 
it is impossible to extract maximal randomness against an adversary with no-signaling 
description (de la Torre et al., 2015). 


9.3.2 Key distribution 


The idea of securing key distribution (KD) against a no-signaling adversary is what 
triggered the awareness of device-independent certification (Appendix F1.2). It is 
relatively easy to deal with an adversary with a i.i.d. no-signaling description.!° The main 
difference from the previous subsection is the fact that, while randomness is a property 
of the behavior alone, the security of KD depends on the protocol too. A protocol based 
on CHSH was studied in (Acin et al., 2006; Scarani et al., 2006). 

However, in KD we want to have a full-fledged no-signaling adversary, able to 
manipulate the corresponding side-information. Against such an adversary, it is provably 
impossible to define extractors along the usual lines (Arnon-Friedman and ‘Ta-Shma, 
2012; Hanggi et al., 2013). Maybe unconventional constructions could lead to more 
optimistic results, but most of the community seems to take these results as indication 
that security is actually impossible.!! 


9.4 Refinements on Quantum Indeterminacy 


We know since chapter 1 that, barring the adoption of higher stances leaning towards full 
determinism, Bell nonlocality implies intrinsic indetermination if signaling is not allowed; 
we have just been reminded of it in the certification of randomness. Here we are going 


10 This is the analog of proving security against individual attacks in QKD. 

11 Also, one should not forget that we are trying to base a security proof on the knowledge of the behaviors 
alone, and not on the underlying theory: To the best of my knowledge, it is not even clear that security against 
a quantum adversary could be proved in these conditions. 
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to see more refined features of quantum indeterminacy can be recovered in the no-signaling 
framework. 

Concretely, in a classical theory, the state of a composite system is pure if and only 
if the state of each component is pure. This is famously false in quantum theory. We 
are going to see that two aspects of this statement are not just formal consequences 
of quantum theory, but necessities of any no-signaling theory that contains the singlet 
behavior (9.4). 


9.4.1 The notion of local fraction 


In a paper that is often referred to as EPR2, Elitzur, Popescu, and Rohrlich (1992) 
asked whether a source of entangled pairs may be actually two ensembles of pairs, one 
of which is made of classically correlated objects. This physically-posed question has a 
natural counterpart in the setting of games and simulations: If a behavior P has a local 
component, the players will have to worry only about a fraction of the rounds. 
Formally, any no-signaling behavior P can be decomposed as the convex sum 


P =pPL+(1—p)Pns (9.8) 


where p € [0, 1] and Pz is local. The behavior Pys = Poort is only requested to be a 


valid probability distribution; it is no-signaling by construction, since both P and Pz are. 
The local fraction pz is defined as 


PL (9.9) 


z (0.8) holds? i 
Clearly, P is nonlocal if and only if p < 1. In fact, the violation I(P) of any Bell 


inequality puts an upper bound on pz (Barrett et al., 2006). Indeed, from (9.8) it follows 
immediately that I (P) < ply + (1 — p)Ins, whence 


Iys — I(P 
PLS a (9.10) 


For behaviors in finite Bell scenarios, pz can be computed algorithmically by testing all 
the Bell inequalities, and the corresponding Pz will be found as the point on a facet of the 
local polytope that is nearest to P. For instance, the quantum behavior that maximizes 
CHSH has local fraction pz = 2 — /2 ~ 0.59. 

One can also ask what the local fraction of a state is by considering the state behavior. 
At the moment of writing, no algorithmic approach has been found for this problem. 
One has first to conjecture that a value of pz, maybe an upper bound (9.10), is optimal; 
then guess a form of the local behavior Pz. With this, one checks if the derived Pys is a 
valid probability distribution (Exercise 9.2). This approach has been successfully carried 
out only for the family of pure entangled states of two qubits (3.15) and for projective 
measurements, for which the local fraction is pz = 1 — cos 20 (Portmann et al., 2012). 
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Unfortunately, the construction of the local behavior Py is an awkward tour de force, 
hardly inspiring generalization. As we mentioned in subsection 3.3.1, it remains an open 
problem to determine exactly the locality of the Werner behaviors (3.24). Beyond the 
case of qubits, the local fraction is virtually unexplored (Scarani, 2008). 

Nonetheless, the case of the singlet is remarkable, as already proved in the EPR2 
paper. For the singlet behavior (9.4), the chained inequalities lead to I(P) = Ing in the 
limit of large M (subsection 4.2.3), hence pz = 0. In other words, the singlet behavior is 
fully nonlocal.” 


9.4.2 The randomness of the marginal distributions 


The observation that the singlet behavior is fully nonlocal does not imply that it 
is extremal in the no-signaling polytope: As we have seen in section 9.2, it can be 
decomposed on a family of behaviors described by a PR-box complemented with local 
variables. That decomposition is such that 


x 4 
P, (ala) = P (01d) = 5: (9.11) 


Even knowing A, the local statistics are maximally random. This is analogous to the 
situation in quantum theory, where the partial states of maximally entangled states are 
maximally mixed. But is this a necessity? Can’t one decompose the singlet behavior on 
a family of no-signaling behaviors with biased marginals? Without denying indetermina- 
tion, such a model would at least recover some classical flavor. 

The question was first posed by Tony Leggett (2003). Assuming P,(a|a) = 3a + 
iy -@ and PLÈ = 4(1 +46, -6) with ||%|| = Ilôal| = 1, that is maximal bias with 
the standard dependence on the measurement direction,!? he derived an inequality 
that is violated by the singlet behavior. Leggett’s original inequality uses an infinite 
quantity of inputs, but another can be found with finitely many inputs, suitable for 
experimental verification (it is worth noting that tests are not device-independent). These 
developments generated some excitement at some point, see (Branciard et al., 2008) and 
references therein. Eventually, the study of Leggett-type models was superseded by a 
device-independent proof that every no-signaling decomposition of the singlet behavior 
must satisfy (9.11) (Colbeck and Renner, 2008). 

The proof uses the variational distance between two probability distributions on the 
same alphabet C, defined as!* 


12 Th fact, any sub-behavior including all the input vectors lying in a plane is already fully nonlocal. 

13 In particular, this model is no-signaling, as each player’s marginal does not depend on the other player’s 
input. 

14 For readers familiar with quantum theory, it is the classical analog of the trace distance for states, and has 
the same operational interpretation as “guessing advantage.” 
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1 
D[Pe, Qc] = 5D PeO — Qc. (9.12) 
ceC 


The same definition carries over immediately to conditional probabilities, which we shall 
denote Pejo. The properties of the variational distance used in this proof are proved in 
Appendix G.7. 

The first property we are going to use is the following: If the two distributions are 
marginals of a joint distribution with identical alphabets A = B, then 


D(P.4; Pg) < Paglas b). (9.13) 


Let us then start from the chained inequality in its form (4.7) applied to one of the P}. 
Using (9.13), we have 


Cy (Py) = D(P.4i1,1,49 PBj1,1,a) + D(PBj2,1,49 PAj2,1,a) + D(P.4)2,2,49 PBi2,2,.) +- 
. + D(P AM, M, PBM, Mr) + DP Bit, mas 1 — Pan, ma) 


with the notation P.4)x,y,, and analog for Bob. No-signaling implies P Alx, = P.Ajx,, and 
PBixy,a = PBw,a- Having made this replacement, we can use the fact that the distance 
satisfies the triangle inequality D[P, Q] + D[Q, R] > D[P, R] to obtain for instance 


D(PAlx=1,42 PBiy=1,.) + DP Biya1,a2PAlx=2,a) = DP Alx=1,a9PAlx=2,r) 
By iterating the triangle inequality we eventually reach 
Cy (Pa) = D(P.Ajxaiyas PAle=i,a) (9.14) 


where P Ata is defined by P,(a|x) = 1 — P, (a|x). The r.h.s. can be computed explicitly 
and we finally find 


1 1 
Cry (Pa) > |P (a = 0|x = v-3/+ P,(a=1|x=1)—5). (9.15) 


At this point, recall that for the singlet behavior (9.4) it holds Ch (P) = 0. Since this value 
is the minimum in the no-signaling set, if Ch, (P) = f dà Q(A)Ch, (Ph) it must be the case 
that Ch (P) = 0 for all à. Plugging this in (9.15) forces P, (a = 0|x = 1) = P, (a = 1|x = 
1)= $. The proof for all other inputs x = 2, ..., M and for Bob is exactly the same: One 
just has to keep the suitable terms when iterating the triangle inequality. Finally, to prove 
(9.11) for any two â and b, one has just to test the chained inequality with inputs in the 
plane that contains those two vectors. 
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EXERCISES 


Exercise 9.1. Consider a hypothetical three-partite resource, such that the marginals Alice- 
Bob and Alice-Charlie are the PR-behavior:a ® b= xy anda @ c = xz: 


(a) Prove that such a three-partite resource is signaling. Hint: Suppose that Bob and Charlie 
get together: What is b ® c? 

(b) Consider adding noise to the PR-behaviors: Now, a ® b = xy holds with probability 
1 — q, while with probability q we have a ® b = xy @ 1, still with unbiased marginals; 
and the same for Alice-Charlie. What is the minimal value of q such that the three- 
partite behavior is no-signaling? 


Exercise 9.2. Take the state behavior (3.29) associated to projective measurements on a pure, 
non-maximally entangled two-qubit state. Prove that the EPR2 decomposition (9.8) is valid 
with the guess 


a 1 
Pr(a; Bla, b) = gu taflas)1 + pfe] with 


Fad = sign(w) min (1. — ma) 


as long as p < 1 — s (Scarani, 2008). Remark: It was proved that this is the highest pL 
achievable with a product PL. 
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The Quest for Device-Independent 
Quantum Principles 


Considerate la vostra semenza: fatti non foste a viver come bruti, ma per seguir virtute 
e canoscenza. 


Consider your origin; you were not born to live like brutes, but to follow virtue and 
knowledge. 


Dante, Inferno 


No-signaling defines a larger set of behaviors than the quantum set: The additional 
constraints that define the quantum set could then be taken as physical principles satisfied 
in our universe (insofar as we know). This chapter reviews the attempts to identify such 
principles. 


10.1 Context: The Definition of Quantum Theory 


Quantum theory appeared in 1926 from the works of Schrödinger and Heisenberg, after 
two decades of attempts to go beyond classical physics. By the end of that same year, 
Born and Jordan had stressed that the two approaches are actually two versions of the 
same formalism. That formalizm is the one we still use today. It is widely accepted that the 
core difference between classical and quantum theories lies in the description of a physical 
system, whereas the axioms about evolution are basically the same. In classical theory, 
the space of states is a set: Physical properties are associated to subsets, pure states are 
the minimal subsets i.e., points. In quantum theory, the space of state is a vector space: 
Physical properties are associated to subspaces, pure states are the minimal subspaces 
i.e., one-dimensional ones or rays (more in Appendix C.1.1). This is, in essence, the 
definition of quantum theory. Since very early on, people have tried to “make sense” of 
this definition. This endeavour bears of course the marks of some subjectivity: There is 
no strict consensus about what “makes sense.” 

The major attempt in this direction is the program of quantum logic, initiated by von 
Neumann. The goal of the program is to reconstruct quantum theory: More precisely, to 
single it out by adding suitable axioms in a very broad class of theories. Interestingly, 
this program managed to find a definition of classical theory: The space of states is a 
set if, for every physical property, the question “does the system in this state possess 
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this property?” is answered by either yes or no with certainty. Clearly our daily intuition 
requests this, and just as clearly quantum theory denies it. The long sought definition 
of quantum theory, however, took longer to be found. A breakthrough was made when, 
inspired by quantum information science, people started adding axioms that refer to the 
description of composite systems. Currently, finite-dimensional quantum theory can be 
fully reconstructed within the framework of “generalized probabilistic theories” (GPTs) 
through several set of axioms, all of which have an operational flavor (as to whether this 
is enough to “make sense,” it’s each one’s call). The recent book by D’Ariano, Chiribella, 
and Perinotti (2017) starts with a very informative review of all this history and proceeds 
to present one of these reconstructions. 

In 1994, Popescu and Rohrlich asked whether the principle that define quantum 
theory could possibly be Bell nonlocality without signaling (Popescu and Rohrlich, 1994). 
The PR-behavior was their counterexample. With the passing of the years, it was realized 
that their approach set the quest for “making sense” of quantum theory in a different 
direction: Instead of singling out quantum theory in a class of theories, one could try 
and single out quantum behaviors in the no-signaling set by adding further principles to 
that of no-signaling. None of the proposed principles actually fulfils the goal perfectly. 
We are going to study the three main proposals:! Information Causality (section 10.2), 
Macroscopic Locality (section 10.3) and Local Orthogonality (section 10.4). 

Before delving into these, let me stress that the two approaches are complementary. 
On the one hand, working only with behaviors, one can define principles that are 
device-independent; whereas the construction of a GPT assumes that one knows the 
dimensionality of the system under study, in the form of fiducial sets of data that contain 
all the information available on the system. On the other hand, identifying a set of 
behaviors does not yet identify the underlying theory, possibly theories, that lead to that 
set? One could try and merge the two approaches: There are some very preliminary 
results in this direction, too technical to be reviewed in this book. 


10.2 Information Causality 


10.2.1 The basic task: Random access code 


Like van Dam’s insight with communication complexity (subsection 9.2.3), Information 
Causality is defined through a communication task assisted by no-signaling resources. 
The task is the so-called random access code (Figure 10.1, top). We consider the simplest 
example. Alice’s input consists of a pair of bits (xo, x1) € {0, 1}* drawn uniformly at 


1 Their names sounds like titles of episodes of the sitcom The Big Bang Theory. I am not aware of any causal 
link, beside the fact that the sitcom was very accurate in its depiction of academia and its jargon. 

2 At the cost of repeating ourselves: The approach through behaviors does not aim at singling out “quantum 
theory among no-signaling theories,” as casually stated in many papers on the subject. 
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Limited amount p 


E 


xo ® xı y 
i m =a @® xo z 
| 

B=m@b 


Figure 10.1 Top: The task that inspires Information Causality is a delocalized random access code: 
Bob’s output B is supposed to be equal to xy, the element of Alice’s string indexed by Bob’s input. The 
task is made non-trivial by allowing a limited amount of communication from Alice to Bob. Bottom: 
With the use of a PR-box (a no-signaling resource), Bob can successfully output B = xy in every round, 
with Alice sending him only one bit of communication m. 


random among the four possible values. Bob’s input is a bit y € {0, 1} unknown to Alice 
a priori? Bob is asked to output Xy. 

If Alice cannot send any information to Bob, Bob’s output will obviously be uncor- 
related with Alice’s inputs. If Alice can send two bits, Bob’s output can be correlated as 
desired. The interesting case is when Alice is restricted to send only one bit. Without shared 
resources, it’s clear what is the best strategy: Alice sends always one of her bits, say xo; 
Bob outputs the bit he receives: If y = 0, his output is the desired one, while if y = 1 his 
output is uncorrelated with the desired answer x). 

But could Alice and Bob be more successful if they would share no-signaling 
resources? Intuition suggests a negative answer: The piece of information that Bob needs 
is with Alice, it can’t arrive to him without communication, so a no-signaling resource 
should not help. However, this intuition is wrong: If Alice and Bob could share a PR-box, 
they could win every round of the game! 


10.2.2 The power of PR-boxes, again 


Here goes the protocol (Figure 10.1, bottom): Alice inputs x = x9 ® x; in the PR-box; 
she gets the output a and sends to Bob the single-bit message m = a ® xo. Bob inputs 
y in the PR-box, gets the output b and outputs 6 = m @ b= (a@ b) ® xo. By the rule 
(9.1) of the PR-box, a ® b= (xo ® x1)y; so B = xo ® [(x0 ® x1)y] = xy as claimed. 
Notice that Bob had to put y into the PR-box to retrieve b: Consequently, he gains 
no information on the other bit x1—y. In this sense, nothing has gone blatantly wrong: 
Alice sent one bit and Bob got one bit. The PR-box has not increased the amount 
of information transferred between the players, which is as it should be since it is a 


3 Tf one adds the requirement that Alice is forbidden to know Bob’s choice even a posteriori, the task is known 
as oblivious transfer. We don’t enforce this requirement here. 
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Xo ® x, 
m- 
My = % ® xo 
X2 ® x3 
"i - 
m, =4 9x, 
mo ® mı Bul 
i ' 


m = a ® m 
a 


Figure 10.2 Nested protocol to retrieve one out of N = 4 bits (xo, x1, X2, x3) with one bit of 
communication, that works perfectly for the PR-box. Referring to the basic protocol sketched in Figure 
10.1 bottom, we see that mo (respectively, mı) is what Alice should send to Bob to retrieve either xo or xı 
(respectively, either x2 or x3). Using the third box and the basic protocol, Bob can retrieve either mo or 
mı: With this piece of information, he can then retrieve the desired xy querying either the first or the 
second box (this last step is not shown). 


no-signaling resource. It is nevertheless puzzling that Bob can retrieve perfectly 
whichever bit he chooses to read out: It looks as if both bits had been transferred to 
his location, even if ultimately he is allowed to access only one.’ Information Causality 
(IC) basically states that this should not be possible (Pawłowski et al., 2009). 

Before formally defining IC, let us indulge further in the power of the PR-box by 
generalizing the random access code to more inputs. Consider x € {0, 1}4 and y € {0, 1, 
2, 3}, each drawn with uniform distribution. Bob can retrieve perfectly each of the four 
bits if Alice communicated to him only one bit, provided they share three PR-boxes. This 
is done with the following nested protocol (Figure 10.2): 


e Alice inputs x9 ® x, in the first PR-box, and gets the output ap; she inputs x2 ® x3 
in the second PR-box, and gets the output a1. Finally, Alice inputs mg ® mı = (ao 
® xo) ® (a1 ® x2) into the third PR-box, gets the output a. She sends the single bit 
m = mo ® ato Bob. 


e Bob has received y, which we can write in binary notation as y = 2y; + yo with 
yo, yı € {0, 1}. He inputs y; in the third PR-box, and upon receiving the output 
b he can retrieve my, = m ® b. Now, if 4; = 0 Bob must retrieve either xo or x1: 
But he has just retrieved mo, so he can apply the protocol of Figure 10.1 bottom by 
inputting yo to the first PR-box. Similarly, if yy = 1 Bob must retrieve either x2 or 


4 Or, following a different narrative: Suppose that Alice wants to send two books to Bob, but her 
communication channel allows her to send only one. We expect that she would have to make a choice. But 
with PR-boxes, she could encode information about both books, in such a way that Bob can retrieve the one of 
his choice. 
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x3: Having just retrieved mı, he can apply the protocol of Figure 10.1 bottom by 
inputting yo to the second box. Notice that Bob does not need to query one of the 
boxes. 


This construction generalizes to a random access code with x € {0,1} and y € {0, 1,... 
N — 1} for every N = 2” and n € N. The nested protocol has then n layers and requires 
2"-* PR-boxes in the k-th layer, for a total of N — 1 PR-boxes. Using this protocol, with 
just one bit of communication from Alice, Bob can retrieve any of her N input bits with 
certainty, while learning nothing about the others. In any layer, Bob has to query only 
one box. 


10.2.3 Definition of Information Causality 


Let us denote by B the alphabet of Bob’s output 6, by X the alphabet of each of 
Alice’s inputs — it’s of course the same alphabet, and for all the examples here we’ll have 
B = X = {0, 1}. For every y, the protocol that Alice and Bob adopt leads to a family of 
joint probability distributions Ps, X, (By xy) for Bob’s output and the input he is asked 
to guess. The amount of information that £, carries about x, is quantified by the mutual 
information 


I(By : Xy) = H(B,) + H(Xy) — H(By, Xy) (10.1) 


where H(C) = — } ec Pe (o0) log Pc(c) is Shannon’s entropy. By definition of the task, 
Alice’s inputs are uniformly distributed, whence we'll have always H(Xy) = log, |X|. 
When 6 = X = {0, 1}, it holds 


I(B, :X) =1—h(Pp,,x, (By =4y)) (10.2) 


where h(p) = —plog, p — (1 — p) log» (1 — p) is the binary Shannon entropy. 
Let us study two examples when Alice’s input is N = 2” bits (recall that Alice’s inputs 
are uniformly distributed by definition): 


e If the players don’t share any additional resources, and if Alice sends the x bits 
m = (X03...3Xx—1) to Bob, then: 


— For y = 0, ..., e —1, Bob’s output is equal to Alice’s desired input, i.e., 
Pp, xy (B, xy) = têg Therefore H(B,,) = H(Xy) = H(By, Xy) = 1 and finally 
T(By:Xy) = 1. 

— For y = k, .... N— 1, Bob’s outcome is uncorrelated from Alice’s input: 
PB,.Xy (By xy) = 5P(By)s where P(6) depends on what Bob does. In any case, 
H(B,, Xy) = H(B,y) + H(Xy) and therefore I(B, : Xy) = 0. 
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e If the players share N — 1 PR-boxes, we have seen that Pg, X, (Bys Xy) = gx 
for all y as soon as Alice can send one bit of communication. So I($, : Xy) = 1 
for all y. 


Now, Information Causality is supposed to capture the “non-aberrant” behavior in 
a random access code, so these examples suggest the following definition (Pawlowski 
et al., 2009): In a random access code with an input alphabet XY, Information Causality 
is satisfied if 


it IC 
I= XIB, :X,) <K (10.3) 
y=1 


where « = |m| € [0, N log |X|] is the number of bits that Alice communicates to Bob. 

It can be proved that (10.3) holds if the shared no-signaling resources are described 
by quantum theory, which of course implies that it holds also for LVs—and we know, 
from the first bullet listed previously, that (10.3) holds even in the absence of shared 
resources.” In fact, IC is satisfied under very reasonable requirements on the information 
theory on the underlying resources. The first proof used three requirements on the 
mutual information (Pawłowski et al., 2009); a later simplification (Al-Safi and Short, 
2011) showed that IC follows from only two requirements on the entropy function H. 
The first is: Whenever the theory describes a resource that can be described in classical 
theory, the value of the entropy must coincide with the Shannon entropy. The second 
is a form of the data-processing inequality, that says that if a transformation is made on 
Bob’s system, its correlations with Alice’s system should not increase. The details and 
the proof are given in Appendix G.8. 

These requirements definitely hold in quantum information theory, but we don’t know 
if one can construct other theories where they hold. To explore whether IC singles out 
the quantum set, the best we can do at this stage is to work out explicit calculations based 
on the behaviors. 


10.2.4 Recovering the Tsirelson bound with IC 


The most striking success of IC is that it can recover the Tsirelson bound (3.5) for the 
(2, 2; 2, 2) scenario without any reference to the algebra of Hilbert spaces, something 
that could not be achieved building on van Dam’s observation (Brassard et al., 2006). 
Assume that Alice and Bob share isotropic boxes defined as hypothetical resources that 
produce the behavior 


Py (a, b|x; y) =VPpr(a; |x; y) + (1 — V)/4. (10.4) 


5 This agreement between the trivial, LV and quantum bounds is a very nice feature of this definition of IC. 
It would not hold if, instead of using mutual information, we had defined IC using Bob’s guessing probability 
(Al-Safi and Short, 2011). 
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This behavior gives S = 4V for CHSH, and for V = a the behavior (10.4) is the same 


as the quantum behavior (3.11). Then, IC cannot be violated for V < wD because in this 


case the isotropic box can be realized with quantum resources. We want to show that IC 


is violated as soon as V > —. 
J2 


Let’s assume that Alice communicates x = 1 bits to Bob. For N = 2 input bits we have 
P(By =xy)=Vx1+0—V)x 5 = iy for all choices of x, y. So, IC will be violated if 


2 È —h iy | > 1, that is for V 2 0.78. This is not yet the result we are looking for. 


But when N = 2”, the nested protocol introduced in subsection 10.2.2 applied to the 
isotropic box gives 


1+7” 


P(B = xyl¥,y) = 7 


(10.5) 


for all choices of x,y (Exercise 10.1 for N = 4). Crucially, the error does not grow too 
fast because only n boxes are actually used: Recall that Bob uses only one box per layer. 


Using the Taylor expansion 1 — h ( L) > w we find that IC is violated if 


n 2\n 
shap g >l. (10.6) 
2 21n2 


In other words, for every value V > I there exist a n such that IC is violated, which 
concludes the proof. 


10.2.5 Information Causality as a physical principle? 


Remarkable though it is, the recovery of the Tsirelson bound refers to one extremal point 
of the quantum set of the (2, 2; 2, 2) Bell scenario. Results for other parts of the quantum 
set and other Bell scenarios have been difficult to obtain. One of the difficulties is that 
we don’t know any necessary condition for the violation of IC. For instance, for some 
families of boxes the nested protocol described previously does not reach the quantum 
set (Allcock et al., 2009): In this case, we don’t know if it is because the protocol is not 
optimal, or because indeed the remaining boxes satisfy IC. Overall, it is still possible that 
IC identifies the quantum set, but most people conjecture the opposite. 

Furthermore, the extension of IC to multipartite Bell scenarios has not been suc- 
cessful: All attempts at generalization tried so far ultimately reduce to a bipartite task, 
and a very general argument shows that no bipartite task can single out the quantum 
set of multipartite Bell scenarios (Gallego et al., 2011). All these developments have 
been described in (Pawlowski and Scarani, 2016): Contrary to the wish expressed there, 
research in IC has not significantly progressed since. 

At any rate, it would be wrong to end on a gloomy note. Even if IC is not the 
desired quantum principle, it is a property of all quantum resources and therefore of 
nature as we currently know it. The impossibility of outperforming classical resources 
in that communication task adds to the no-go theorems conclusively proved in quantum 
information science. 
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10.3 Macroscopic Locality 


10.3.1 Motivation and formulation 


The possibility of measuring individual quantum systems is so crucial to present- 
day quantum science and technologies, that we may forget that it’s a relatively recent 
development in the laboratories. Only in the 1970’s it became possible to observe 
interference effects while recording individual events.® Interestingly, these are the same 
years when the first attempts at Bell test were being conducted, and less than a decade 
before the first conclusive demonstrations of Bell nonlocality (Aspect et al., 19826, Aspect 
et al., 1982a). 

One may wonder: If Aspect and coworkers would not have had single-quanta detec- 
tion techniques, if they could only have measured rather intense currents of photons,’ 
would they have been able to observe nonlocality? In the limit of very large currents, 
the answer is no, as we are going to show. Just as Popescu and Rohrlich did with no- 
signaling, Navascués and Wunderlich (2010) turned this observation into a candidate 
for the quantum principle. They called macroscopically local those behaviors, such that 
a coarse-graining over many systems washes nonlocality out; as usual, we adopt this 
wording without debating its appropriateness.® 

For the formal definition of macroscopic locality (ML), consider a (M4, m; Mp, m) Bell 
scenario, labeling the outputs as a, b € {0,1,...,m— 1} for convenience. The players have 
agreed on a i.i.d. strategy that would generate the behavior P. However, the verifier is not 
going to play the usual Bell test. Rather, he sends the same inputs (x, y) for N consecutive 
queries; from the string of outputs (a, fodo a) and h ake BY) collected in one 
such round, he just records the number of occurrences of each value in the string 


Ix = Cox. Ttjxs ---2Lm—1lx) and Jy = Fow Fily» --->Fm—i|y)> (10.7) 
with 
N N 
Ias = Do aa Say = Dip (10.8) 
r= k a 


6 For electron interferometry, it seems that the first demonstration with individual detections was that of 
Merli, Missiroli, and Pozzi at Bologna in 1974. For neutron interferometry, credit is unanimously given to the 
work of Helmut Rauch, based in Vienna but conducting those experiments in Grenoble. 

7 Their source based on atomic cascade producing very few events in the relevant modes, it would have been 
very difficult to produce such intense currents; but this is beside the point here. 

8 Certainly macroscopic locality does not pretend to capture either the essence, or even an essential feature, 
of macroscopicity. This notion has a long history of formulations and is still the object of studies and discussions. 
Topics like “emergence” in condensed matter physics, or the “sorites paradox” in philosophy, are related to 
it. In the foundations of quantum mechanics, macroscopicity plays a double role. First, in the attempts at 
explaining why we typically don’t see quantum effects at our everyday scale: ML builds on this, focusing on the 
impossibility of observing Bell nonlocality. Second, in the efforts to demonstrate quantum effects with unusually 
large objects (“Schrödinger cats”). 
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Figure 10.3 The coarse-graining that is used in the definition of Macroscopic Locality. The resource 
consists of N identical sources, or N uses of the same source (N = 5 in the figure). The players record only 
the number of times each output is produced. By doing so, their local statistics are unchanged, but the 
correlations between the two players are coarse-grained. 


Translated into laboratory language, it captures the situation where a source is emitting 
1.i.d. pairs that are measured in groups of N at each station, and only the resulting currents 
are recorded (Figure 10.3). In this coarse-graining, what is lost is the information about 
which output of Alice’s was correlated with which output of Bob’s. 

If the underlying behavior is the PR-behavior (9.3), the verifier can still certify 
nonlocality after the coarse-graining. Indeed, for any N, each round would exhibit Jo). = 
Joy for xy = 0 and Io = Fij for xy = 1; and notice that 14), = N — Jojx and Fiy = 
N — Foy since m = 2. Suppose now that one applies the local post-processing of majority 
vote: Alice outputs a = 0 if Ios = X and «œ = 1 if not; and Bob does the same for £. 
The post-processed behavior P(a, B|x, y) thus obtained is the PR-behavior itself.? Even 
before a majority vote, therefore, the coarse-grained data contained nonlocality. 


10.3.2 The set of ML behaviors is Q; 


The principle of macroscopic locality states that the verifier should not be able to certify 
nonlocality after the coarse-graining, in the limit N — oo. We have just seen that the 
principle rules out the PR-behavior. Following (Navascués and Wunderlich, 2010), we 
are going to see that the set of ML behaviors is identical to Q1, the first step of the NPA 
hierarchy. 

The proof combines two observations. First, by Fine’s theorem, the behavior P(x, Jy) 
of the coarse-grained currents is local if and only if it is the marginal of a joint probability 
distribution P(I,,I2,..., Im4; J1;J2;-- -> Jag). Second, the individual currents (10.8) are 
sums of i.i.d. random variables; as a consequence, each Iy and each Jy are sums of i.i.d. 
random vectors. Both observations are valid for all N, but in the limit N — oo the central 
limit theorem provides a very powerful characterization of ML: If P exists, and if the 
detectors have finite resolution of the order O(N), the fluctuations of its variables 
around their average must obey a multivariate Gaussian distribution with zero average 


9 More precisely, rounds for which F1)1 = Joj = N exactly would lead to œ = 6 = 0 in spite of xy = 1. But 
such rounds are impossible if N is odd, and of negligible probability if N is even and large. 
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and covariance matrix l > 0. Inversely, if such a Gaussian distribution exists, it defines 
a valid P. 

The proof will be finished by showing that I is the matrix that defines Q; for the 
single-round behavior P. Indeed, lets define the fluctuations as 


Iajx = (Max) Joy E Foly) . 
JN JN ` 


the entries of the covariance matrix are the Ty = (f; - f;), i.e., with a suitable labeling 


r= (SHES. re 


The elements of the off-diagonal blocks are terms of the form 


Jax = (10.9) 


> Jow = 


1 
(fa|xfbly) = ay Jow) z Max) Foly)) 


NY Pia, bix, y) — P(alx)P(b|y). (10.11) 


The terms of the diagonal blocks have a similar form, but can be associated to observable 
probabilities only when x = x/ (y = y/). The matrix (10.10) defines indeed the step Qı 
of the NPA hierarchy, up to redefining the measurement operators as Fajs = II} — (IXM 
and Fo, = JH — (11g) I. 

As a corollary of this proof, notice that, under the promise that the players are using 
an i.i.d. strategy, the verifier can reconstruct P from the covariance matrix of the 
coarse-grained statistics. Thus, he may infer that they share a nonlocal resource, whose 
nonlocality could be manifest if fine-grained detection would be available. 


10.3.3 Summary of Macroscopic Locality 


Just as with IC, all quantum behaviors satisfy ML: The intuition that one should not 
be able to observe nonlocality in those conditions is vindicated. As a candidate for the 
quantum principle, ML definitely does not single out the quantum set: Rather, it provides 
a physical definition of the larger set Q; in any Bell scenario, including multipartite ones 
(both the definition and the proof extend naturally). In particular, like IC, the Tsirelson 
bound is recovered in the (2, 2; 2, 2) scenario. There are behaviors that satisfy ML but 
violate IC (Cavalcanti et al., 2010). 

Can one come closer to Q with similar ideas? One could study ML for finite N: This 
certainly defines a smaller set of behaviors, because the nonlocality of some P will not be 
washed out. Also, one may consider unrealistic resolutions of the detectors, so that other 
moments of the limiting distribution must be studied. Initial studies in these directions 
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uncovered features that may be of interest for the experts, but seem to indicate that one 
does not recover Q in any case (Zhou et al., 2017). 


10.4 Local Orthogonality, a.k.a Consistent Exclusivity 


The idea of local orthogonality (LO) is based on a simple observation. Two probabilities 
Pca, ...|x, ...) and P(as, ...|x, ...) are associated in quantum theory to orthogonal 
projectors as soon as a4a/, whatever the inputs and outputs of the other players. As 
a consequence, in quantum theory one has P(a, ...|x,...) + P(@,...|x,...) < 1. This 
idea extends to sums of more terms; in fact, we have already used it to prove the quantum 
bound of the GYNI inequality (5.3). But GYNI is violated by no-signaling behaviors: 
So this restriction can be erected as a principle that may single quantum theory out. 
Explicitly, the principle of local orthogonality requests 


So. Plab, c... layz...) <1 (10.12) 


a, b, C...5X3 Yy Byes 


if, for every pair of terms in the sum, there is at least one player that has the same input but 
different output (Fritz et al., 2013). Almost simultaneously, the analogos mathematical 
idea was being proposed as a principle for contextuality and called exclusivity principle 
(Cabello, 2013). The wording consistent exclusivity was used in a graph-theoretical 
synthesis of all relevant notions and results for both nonlocality and contextuality (Acín 
et al., 2015). 

For bipartite Bell scenarios, LO is automatically satisfied by all no-signaling behaviors, 
as first proved in (Cabello et al., 2010). However, some hypothetical bipartite no- 
signaling resources can be ruled out by LO by using them in multipartite Bell scenarios. 
For instance, the behavior describing two independent PR-boxes P(a, b, c, d|x, y, Z, w) = 
Ppr(a, b|x, v)Ppr(c, d|z, w) violate LO in a four-partite scenario since it gives 


P(O, 0, 0, 0/0, 0, 0, 0) + PC, 1, 1, 0|0, 0, 1, 1) + PO, 0, 1, 1/0, 1, 1, 0) 


5 
+P(1, 1,0, 11,0, 1, )+P(0, 1,1, 11,1,0,1)=7 (10.13) 


whereas LO requires this sum of probabilities to be bounded by 1 (see also 
Exercise 10.2). 

Thus, a behavior may satisfy LO for one copy but not for two (or more) independent 
copies, which makes the study of LO rather convoluted. For instance, it was proved that 
the set of behaviors P such that n independent copies satisfy LO is not convex even 
in the limit n —> oo. One can also consider independent copies of different behaviors. 
In this context, consider the composition of a behavior P with an arbitrary quantum 
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behavior Po € Q: Then P x P’ satisfies LO if and only if P belongs to the “almost- 
quantum set” described in the next section; and this remains the case also if one consider 
the composition of multiple copies. For the proofs of these results and others, we refer 
to!® (Acin et al., 2015). But in summary, also the LO principle fails to identify the 
quantum set. 


10.5 The Current Barrier: The “Almost-Quantum” Set 


Even taken together, the three principles of Information Causality, Macroscopic Locality 
and Local Orthogonality do not single out the quantum set. Rather than proposing other 
principles, some authors suggested that one cannot do much better (Navascués et al., 
2015). Their observation is a powerful one, although it does not have the full strength of 
a no-go theorem. 

The definition of Q’ is based on the requirement (6.2) that the operators that 
describe measurements by different players must all commute. The idea is to relax this 
requirement by demanding that those operators commute when acting on the state that is 
used to generate behavior: 


(113, Ty] lv) =0 (10.14) 


for P(a, b|x,y) = (Y STE |v). Notice that, for this formulation, it is convenient to make 
full use of the unspecified Hilbert space dimensions and assume both the state to be pure 
and the measurements to be projective. The generalization for multipartite scenarios 
is obvious. The set defined by this condition was called the “almost-quantum” set. It is 
strictly larger than Q’: For bipartite scenarios, it is Q1+4pg in the NPA hierarchy. It was 
then proved that all behaviors in the almost-quantum set satisfy both ML and LO. At the 
moment of writing, it has not been proved that they satisfy IC as well, but all the extant 
examples are compatible with such a conjecture. 

More importantly, it has been conjectured that no criterion defined only on observed 
behaviors will come closer to Q’ than the almost-quantum set. The basis for the 
conjecture is the observation that a behavior carries information about the action of the 
operators only on the corresponding state. While this is not a tight mathematical proof, 
it is a very convincing argument. Current attempts of coming closer to the quantum set 
complement principles defined on behaviors with structural properties of the underlying 
theory. 


10 Tt must be noted that Q, in that reference is not Q; of the NPA hierarchy as defined in this book: It is 
a definition that is convenient both for contextuality and nonlocality, and it does correspond to the “almost- 
quantum” set in the latter case. 


The Current Barrier: The “Almost-Quantum” Set 141 


EXERCISES 


Exercise 10.1. We consider the nested protocol sketched in Figure 10.2: 


4 y x : a 2 
1. For N = 4 input bits to Alice, prove that the protocol gives P(B = xy|x,y) = 14y for 
the isotropic boxes defined by (10.4). 
2. Sketch the drawing of the protocol in the case of N = 8 input bits to Alice. 


Exercise 10.2. Find the value of the expression on the left-hand side of (10.13) when P(a, 
b, c, d|x, y, z, w) = Py (a, b|x, y) Py (c, d|z, w), a product of two isotropic boxes (10.4). Until 
which value of V is the principle of Local Orthogonality violated? 


11 


Signaling and Measurement 
Dependence 


When you have eliminated the impossible, whatever remains, however improbable, 
must be the truth. 
A. Conan Doyle, The Sign of Four 


Communication (signaling) and measurement dependence are the two possible mecha- 
nisms, by which information about some players’ inputs may be available at other players’ 
location. In this last chapter, we study models that allow for such channels, relaxing the 
strict requirements of no-signaling and measurement independence. 


11.1 Motivation: Towards Ultimate Relaxations 


After some discussions in the second half of chapter 1, we have focused on describing 
nonlocality for no-signaling resources under the assumption of measurement independence. 
Besides being in tune with the most common interpretations of quantum theory and of 
our scientific activity in general, these two assumptions are necessary in the following 
sense: With unrestricted communication, or with unrestricted correlations between the 
shared resource and the inputs, the players can pass trivially any Bell test. 

However, we are going to see that Bell nonlocality is robust to some relaxations of those 
assumptions. This can be understood: Both mechanisms are ultimately a way of carrying 
information about one player’s input to other players: If the information is not sufficient, 
the Bell test retains some relevance. 

Admittedly, once a communication channel is open, or once correlations between 
resource and inputs are allowed, there are no obvious reasons to restrict the shared 
information. This is the reason why these topics have not been studied thoroughly: 
Most experts seem to be content with knowing that no-signaling and measurement 
independence are not tight straitjackets. Besides, only preliminary studies have attempted 
to present both relaxations in the same framework (Hall, 2011; Barrett and Gisin, 2011; 
Chaves, et al., 2015). 

We shall discuss the relaxation of no-signaling in section 11.2, that of measurement 
independence in section 11.4. Between the two, section 11.3 is devoted to another 
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aspect of communication, namely the speed at which such hypothetical influences should 
propagate. Finally, section 11.5 presents a new take on randomness in the context of 
measurement dependence. 


11.2 Signaling Models: The Information 
in the Communication 


We begin by studying models that would produce nonlocal behaviors using actual 
communication, in short signaling models. We shall use the word “influences,” rather than 
“signals,” for these hypothetical mechanisms of transferring information. 

There are at least two reasons to be interested in these models. First, they are one of 
the few ways to recover determinism: It’s good to have a clearer idea of the price one 
has to pay. Second, communication being a process that we are familiar with, they can 
provide an insight on the power of Bell nonlocality. 

In this section, we look at the information that the communication must convey, leaving 
aside the infamous fact that these influences should propagate faster than the speed 
of light. The conflict between such models and relativistic space-time descriptions is 
addressed in the next section. 


11.2.1 Fine tuning 


When considering the information carried by the influences in a signaling model, we 
must start by pointing out that these models have a highly conspiratorial flavor: If nature 
is using communication to produce quantum behaviors, why is it hiding it? To my 
knowledge, nobody has ever attempted to propose an answer. 

For those who think that “why” questions should not be asked in physics, the 
conspiracy can be cast in a different way: In order to simulate no-signaling behaviors 
with communication, the use of the communication channel should be very finely tuned. 
Any deviation from this fine tuning would produce behaviors that violate the no-signaling 
conditions,! thus uncovering the underlying mechanism. Confirming what one would 
intuit, every signaling model must indeed be finely tuned if it has to simulate quantum 
predictions (Wood and Spekkens, 2015; Cavalcanti, 2018). 


11.2.2 Signaling in Bell scenarios 


The influence needs only to carry information about the inputs. In particular, in a Bell 
scenario (My, m4; Mp, mp), all no-signaling behaviors can be trivially realized if Alice 
can send to Bob a My-valued symbol per round (..e., log, M4 bits), or Bob send to Alice 
a Mp-valued symbol (log, Mpg bits). Some behaviors may be simulated with much less 


1 Insofar as we can tell, this fine tuning is not related to that of the parameters of the universe, which inspires 
the discussions on the “anthropic principle.” 
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communication: This is what is studied in the field of communication complexity. The 
review (Buhrman et al., 2010) is devoted to its connection with nonlocality. 

An arbitrary small amount of signaling may be sufficient to produce nonlocality, even 
if it is hidden in a no-signaling behavior, as illustrated by the following protocol. In each 
round, Alice and Bob share either (ao, a1, bo, b1) = (41, +1, +1, +1) or (-1, —1, —1, 
—1) with same probability. If Alice receives the input x = 1, with probability pr she sends 
this information to Bob, who would then flip his bit if he received y = 1. The resulting 
behavior is P(a,b|x,y) = 5[1 + ab(1 — 2prxy)]: It satisfies the no-signaling condition, 
since P(a|x) = P(b|y) = E and violates CHSH for any pf > 0, since S=2(1 + pp). 


11.2.3 Simulating state-behaviors with communication 


If Alice can choose an arbitrary measurement, her output will steer a corresponding state 
on Bob’s side: Since there are continuously many possibilities, one then may naively guess 
that the simulation of a state-behavior requires an infinite amount of communication. 
This is certainly the case if the players do not share any correlation; but if the players can 
share LVs, a finite amount may be sufficient—and, as it turns out, it 7s sufficient for all the 
known examples of simulations of state-behaviors [see section VI.C of (Buhrman et al., 
2010)]. Besides, the amount of communication averaged over the measurements (including 
POVMs) is bounded by O(m?) for any state-behavior with arbitrarily many players and 
m outputs per player (Brassard et al., 2019). 

Concretely, the singlet behavior (9.4) can be simulated with a single bit of communi- 
cation in each round (Toner and Bacon, 2003). In the version of (Degorre et al., 2005), 
the model is reminiscent of Werner’s construction of a LV model (subsection 3.3.1). 
Alice and Bob share two vectors drawn uniformly on the unit sphere. Alice selects the 
vector that has the largest overlap (in absolute value) with her measurement direction and 
communicates this piece of information to Bob; their output is determined by the sign 
of the scalar product between that vector and their measurement direction. The explicit 
calculation is given in Appendix G.6. These protocols are explicit examples of the fine 
tuning mentioned in subsection 11.2.1: If they were taken as an accurate description of 
what happens in nature, there is no clear reason why those resources should be used in 
such a contrived way. 

At any rate, having found a solution for the paradigmatic case of the singlet behavior, 
one could have hoped for quick extensions and generalizations. There have been indeed 
some extensions (Degorre et al., 2009). But there have also been frustrating difficulties: 
At the moment of writing, it is not even known if the state behavior associated to a 
pure, non-maximally entangled state of two qubits can be simulated with a single bit of 
communication.” We also notice the connection with the (equally frustrating) study of 


2 The trouble seems to be associated with the “Hardy ladder” that we encountered in subsection 4.4.1: 
Most measurements of Alice steer a set of non-orthogonal states on Bob’s side. This observation leads to 
a rather inelegant simulation that uses two bits of communication, and goes as follows. First, Alice chooses 
her measurement direction å and generates her output by sampling from the distribution of the partial state. 
With this knowledge, she knows which state she must steer on Bob’s side, call it [+â]. Now, she turns to the 
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the local fraction presented in subsection 9.4.1: Communication would only be needed 
in a fraction of rounds, but one must still find how much is needed to reproduce the 
corresponding nonlocal behavior. 


11.3 Signaling Models: The Speed of the Influence 


11.3.1 The trouble with “faster than light” 


I want to start this section by reminding the status of the principle of relativity. It is 
trivial that one should be able to transform one’s own coordinates onto those in which 
the phenomenon under study is going to be described. Galileo’s principle of relativity 
postulates that this transformation is never compulsory (though it is very handy and 
routinely used): The laws of physics should be the same in all (inertial) frames. When 
applied to electromagnetism, this principle leads to the Lorentz group of transformations, 
and nicely agree with the observation that the speed of light in vacuum c is independent 
of the observer’s state of motion. Having the Lorentz group, it is a classic exercise to 
prove that, if we were able to send signals faster than light, we could send information 
to our past. To do so, observer A sends a signal at a speed v seen in her Lorentz rest 
frame. Observer B, who is moving away from A, replies with a signal at a speed v seen 
in Ais Lorentz rest frame. A simple space-time diagram proves that A receives B’s reply 
before sending out her query if v > c. Since sending information to the past is regarded 
as absurd, no signal should propagate faster than light in vacuum.? 

The most elementary signaling model for Bell nonlocality assumes that there is a 
preferred frame for the influence that carries the information. In this frame, one of the 
players (say Alice) happens to be the first to query her physical system. This triggers the 
influence that brings information about x to Bob’s system, propagating at speed v in the 
preferred frame. But now, even if v > c, such a signaling model cannot be tweaked to send 
information to the past, because the influence propagates at that speed in the preferred 
frame, not in each player’s frame. Other signaling models may be built on more general 
definitions or more inventive uses of preferred frames.* None opens the possibility of 
sending messages to the past. However, it was proved that the process that generates the 
outputs in signaling models cannot have a Lorentz-invariant description (Gisin, 2011). 

In summary: On the one hand, the greatest reason for fearing faster-than-light 
influences is not a concern; on the other hand, signaling models violate the principle 


Toner-Bacon model simulating the singlet. Alice chooses her measurement in the 7 direction: If the recipe 
returns the output —1, with the Toner-Bacon bit of communication she can steer Bob’s statistics to those of the 
desired state [+â]. If the output is +1, Bob’s statistics will be steered to those of the state |-â): Then Alice has 
to tell Bob to flip his output, and this is the role of the second bit of communication. 


3 A few steps later in the development of the theory, one also finds an energetic reason why massive objects 
cannot travel faster than light. 

4 I entered the field of nonlocality by helping Antoine Suarez to elaborate such a model: Each system sends 
out its superluminal signal in the rest frame of the measurement device it meets (Suarez and Scarani, 1997). 
The most clear prediction of this model was soon falsified in an experiment (Stefanov et al., 2002) and it’s 
while working on this that was born the idea discussed in the subsection 11.3.2. 
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of relativity. Although this principle is not a necessary pre-condition for rational enquiry, 
it has proved very fruitful, and applicable to the Universe as we know it. Abandoning 
it adds another item to the feeling of conspiracy: If there are preferred frames, it is not 
clear why they seem to be used only as “ether for Bell nonlocality.” 

All of this may start sounding wild, so let’s refocus by recalling why people may be 
interested by such models: To save determinism. The simplest way to bring determinism 
back into physics would have been through local variable models. These were falsified 
by merely looking at the observable correlations. May one falsify signaling models too on 
the sole observation of correlations? The answer 1s as close to a “yes” as it can be. This is the 
object of the next subsection. 


11.3.2 The need for infinite speed 


For definiteness, we assume a signaling model using a single preferred frame, although 
the same argument can be exported to other models (Scarani et al., 2014). If the speed 
v of the influence is infinite, any correlation can be distributed, because the information 
becomes instantaneously available everywhere. So, a model with infinite speed cannot be 
falsified by observation. 

A finite speed defines an influence-cone in the preferred frame with the corresponding 
notion of space-like separation (of course, meaningful only in the preferred frame). 
Consider first a bipartite Bell scenario: If the players are sufficiently far apart, a player’s 
output may be produced before the influence carrying information about the other 
player’s input arrives at that location. According to the model, Bell nonlocality should 
disappear. It is refreshing that an alternative model provides testable predictions, and 
indeed experiments were designed to test it (Salart et al., 2008): Observations showed 
that nonlocality was not dimmed—had it been, I would be writing very differently. But 
such experiments can only lead to a lower bound on v, and even that, only given a 
definition of the relevant events (which may be challenged, as discussed in subsection 
1.5.4). A much stronger case against any signaling model with finite speed can be made 
by moving to multipartite Bell scenarios (Scarani and Gisin, 2002). 

We consider N rounds of a Bell test in a three-partite Bell scenario. For the sake of the 
argument, we group all the inputs and outputs processes of each player as a single event 
denoted €,4,z,c. A notation like £4 SE 'B means that the influence carrying information 
about £4 can reach Bob’s location prior to Eg taking place. 

Besides the existence of the influence in a preferred frame, the signaling model is 
defined by two requirements: 


(R1) The observed behavior may deviate from an expected behavior P* only when 
some of the events are space-like separated with respect to the influence-cone. 
For definiteness, we take the expected behavior to be the quantum one that 
the experimentalists may reconstruct from tomography; but the argument is 
independent of this specific choice. 
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R2) The influence must remain hidden: In no configuration of events faster-than- 
light communication between players should be enabled.’ 


Now consider the following configurations of three measurements at different loca- 
tions, with C located between A and B (Figure 11.1): 


Ip) E4-> EB > Ec. 


I4) EB > E4 > Ec. 


ID Ea 5 Ec and Eg i Ec, but E4 and Eg are outside each other’s influence-cone; 
moreover, Ec is space-like separated from both £4 and Eg according to the usual 
light cones. 


By definition, in configurations I4 and Ip we observe the expected behavior P*. We 
denote by P” the behavior observed in configuration JI. 

Here comes the key observation. On the one hand, by timing his event €g, Bob can 
change the configuration between J and Iz. On the other hand, with signals traveling at 
the speed of light, information about Eç can reach Alice before information about Ep. If 


A XPF A XPF 
Configuration II Configuration Ig 


Figure 11.1 The configurations of events used in the argument against hidden influences with finite 
speed. The space-time diagram is drawn in the preferred frame (PF) in which the influence is defined. For 
simplicity, the players’ actions are depicted as instantaneous, so event A corresponds to Alice receiving x 
and producing a; and similarly for B and C. The grey dotted lines represent the hidden-influence cones, 
the black dashed lines the usual light cones. The argument goes as follows: On the one hand, Bob can 
switch between the two configurations by just delaying his event; on the other hand, in both 
configurations, information about B arrives to Alice later than information about C. If we assume that 
the players should not see any faster-than-light communication (i.e., the influences remain “hidden”), 
then the correlations A — C cannot depend on the configuration. A symmetric argument can be made for 
the correlations B — C. A contradiction is then found 1f those pairs of correlations enforce A — B to be 
nonlocal, since in configuration II the behavior of A — B must be local by definition of the model. 


5 Recall that such influence-enabled faster-than-light communication would not allow sending information 
to one’s past. So the cogency of this requirement is not as strong as usual. It can be seen as a desire to keep a 
“peaceful coexistence” with special relativity. 
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Pict Pi Alice would get to know which timing Bob has chosen before any information 
traveling at the speed of light could have carried this piece of information to her. This 
violates requirement R1. Thus, we need to enforce 


By the same argument, Alice can change the configuration between I and I4 by timing 
her event £4, so we need to enforce 


PH =Pho- (11.2) 


There is also a constraint on the third two-player marginal of Pee Since no influence 
connects the two events, 


Pie must be local. (11.3) 


The question is now: Can one find a Bell scenario and an expected behavior P*, such 
that the three constraints (11.1)-(11.3) on P” are incompatible? In other words, we 
are asking if one can define marginal behaviors P4, and Pho» such that all compatible 
Pipc exhibit nonlocality in P%, — in itself, an interesting problem in the mathematics 
of nonlocality. 

For the configuration of events that we presented, examples of such P* were indeed 
found, though they are not quantum behaviors (Coretti et al., 2011). Quantum examples 
were found shortly later, first for a four-partite Bell scenario (Bancal et al., 2012), then for 
a different configuration of events of a three-partite Bell scenario (Barnea et al., 2013). 
There would be little added value in reporting the details here, those references are self- 
contained. If the reader wants to get a simple grasp of the problem, examples are very 
easy to find if one adds to (11.3) the requirement that P” be quantum too (Scarani and 
Gisin, 2005). 

In summary, all models with a finite-speed influence that remains hidden can be ruled 
out based on observation alone, without having to specify any detail (for instance, which 
is the preferred frame in which the influence propagates). An experimental validation 
would consist of a loophole-free Bell test exhibiting one of the P*. Notice that there 
is no need to argue how the event configuration looks like in the preferred frame:° 
One just has to be convinced that the behavior has not been realized by cheating 
via one of the loopholes. No such experiment has been reported, but the quantum 
behaviors P* found in (Bancal et al., 2012; Barnea et al., 2013) can be realized with few 


6 This is exactly the strength of proper hidden-influence inequalities. In this respect, let us recall the work of 
(Jones et al., 2005) described in subsection 5.3.3. There, it was proved that the Svetlichny inequality is satisfied 
if Alice can signal to both Bob and Charlie but these can’t reply or communicate between them. This is exactly 
what a hidden-influence model would predict for the configuration of events studied in (Barnea et al., 2013). 
But the Svetlichny inequality requires three-partite correlations: Its violation could falsify the hidden influence 
model only if one were sure of the configuration of events in the preferred frame. 
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projective measurements on four or three qubits, so nobody doubts that they could be 
observed. 


11.4 Relaxing Measurement Independence 


After relaxing no-signaling, we deal now with the relaxation of measurement indepen- 
dence. Measurement dependence (MD) means that, in each round, the process 4 used 
by the players is correlated with the verifier’s inputs (x, y). As usual, correlation does 
not say anything about causation, so various narratives are possible: The verifier gets to 
know the process and adapts the input, the players get to know the inputs and adapt their 
process, an external cause is influencing both the inputs and the process (this one will be 
used for randomness amplification in the next section)... . The different narratives don’t 
make any difference in the mathematics that follow. 


11.4.1 Behaviors under measurement dependence 


For a start, we need to change the definition of the observed behavior (2.5) that we used 
throughout the book, to take into account that the distribution of the processes à now 
depends on the inputs:” 


P(a, bjx, y) “2 i daP(Alx,y)P(a, b|x, y, À). (11.4) 


The behaviors that the players can produce with LV resources, i.e., those of the form 


P(a, bjx, y) “2” / daP(A\x, y)P(a|x, 4)P (bly, A), (11.5) 


are called measurement-dependent local (MDL) behaviors. For a MDL behavior, P(a|x, y) 
may depend on y, or P(b|x, y) on x: Even if it’s a convex mixture of local behaviors, the 
average behavior may violate the no-signaling condition. This is a consequence of MD 
and does not mean that the players can actually signal to each other. 

The definition (11.4) of a behavior assumes i.i.d just like (2.5). In the case of 
measurement independence, we had eventually proved that the local polytope for non- 
i.i.d. behaviors is the same as that defined by i.i.d. behaviors (subsection 2.6.1). The 
analog result has not been proved for MD, and for some cases of constrained MD it has 
been proved not to hold (see the end of subsection 11.4.3). 


7 We also denote that distribution with P instead of Q and change the notation from P}, (a, b|x, y) to P(a, 
bix, y, 4). These cosmetic modifications, more suitable for the manipulations in this section, do not change the 
meaning of those objects. The only real mathematical difference is the correlation between i and x, y. 
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11.4.2 Imposing restrictions on MD 


Next, we must impose restrictions on MD. These could conceivably be motivated on 
physical grounds, but actual examples have not been found.’ At the moment of writing, 
then, any a priori choice of restriction is arbitrary; and we don’t get a real-life feeling for a 
posteriori results of the type “the observed behavior certifies Bell nonlocality even if this 
form of MD is allowed.” Hoping that this unpleasant state be temporary, let’s carry on. 

All the restrictions studied so far are based on quantifying MD according to some 
figure of merit, then declaring that one tolerates at most this or that amount of MD. 
As figure of merit, in the early studies one finds the maximal deviation M = supy yx’ y' 
{dd|PA|x,y) — PA|x’,y”)| (Hall, 2011) and the mutual information H((¥, V) : A) = 
H(X,Y)+ H(A) — H((4,Y), A) where H(A) = —>°, P(a) log P(a) and A is the alpha- 
bet of the process indices A (Barrett and Gisin, 2011). 

Subsequent studies have found it more convenient to quantify MD by bounding the 
P(x, y|A) according to 


£< P(x, y|A) <h for all x,y, à. (11.6) 
From P(x,y) = f dà P(x, y|à)P (à), there follow the bounds 


£< minP(x, y), h > max P(x, y) (11.7) 
X,Y X,Y 


that reduce to £ < MM < hif P(x,y) = Ea for all (x, y), as frequently assumed. 

Let us get a quick hold on these constraints. Consider first the (2, 2; 2, 2) Bell scenario. 
Setting h = t and £ = 0 opens the possibility that, in each round, the verifier promises 
not to query one of the four possible pairs of inputs, and the players know which one. 
In this case, the PR-behavior (9.3) can be realized as MDL behavior (Exercise 11.1). 
This implies that the whole no-signaling polytope can be realized with MDL behaviors, 
and Bell nonlocality in that scenario cannot be demonstrated. For a generic (M4, m4; 
Mp, mp) Bell scenario, we can be sure that the whole no-signaling polytope can be 
realized with MDL as soon as h> A T if £ = 0. Indeed, with these values, in 
each round the verifier may choose his inputs only among the (M4 + Mp — 1) pairs 
y) € {(x',0)5(Xy) |x’ € X, y € V}, for a given x € X and ye YV. It can be proved that 
any behavior involving only those inputs is local (Thinh et al., 2013). In all cases, notice 
that MD makes it impossible to certifiy nonlocality well before reaching A = 1, which is 
where the process and the inputs could be fully correlated. 

The constraint (11.6) is also open to generalizations beyond i.i.d. that have been 
widely studied in the mathematical literature on sources of randomness. A natural 
generalization is the so-called Santha- Vazirani (SV) constraint: 


8 Here could be a candidate: In a lab, all the devices are usually connected to the city’s electric grid; 
so, fluctuations in electric power affect both the devices that prepare the state and those that perform the 
measurements. However, in all the examples I am aware of, the effects of power fluctuations do not translate in 
the kind of MD that would be useful in Bell tests. 
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P(xks VR|A5 Xk—15---3X13Yk—1; ---Y1) > for all x,y, A and for all k. (11.8) 


This describes a non-i.i.d. process A for which all pairs (x, y) have non-zero probability, at 
any round and whatever the past. Obviously this implies that no pair (x, y) has probability 
one either:? So, this process has randomness. 

The loosest constraint is the so-called min-entropy constraint. Assuming that À is a 
process that correlates N rounds, it reads 


P(x, dA) < hy for all x= (X15...5 xN); Y = isea INJ À. (11.9) 


Contrary to the SV constraint, the min-entropy constraint is such that some of the 
sequences may have zero probability. For comparison with the i.i.d. case, one may define 
an effective single-round hett by setting hN = Neg. 


11.4.3 The MDL polytope, and one inequality 


We are now in a position to discuss the set of MDL behaviors given a constrained 
MD. We devote most of our attention to the 1.i.d. case with constraints (11.6). In 
order to use these constraints, we need to re-express MD behaviors (11.4) in terms 
of P(xy|A). This can be done using Bayes’ rule PA|xy) = P(xy|A)P(A)/P(xy). In the 
resulting expression, the constraints affect both the numerator and the denominator, 
since P(xy) = f dà P(xy|à)P(à). It would be simpler to re-define behaviors as describing 
the joint distribution of outputs and inputs, rather than the conditional distribution, that is 


P(a, b, x, y) "E J APOP, y|)P (as blæ y, à). (11.10) 


For this new definition of behavior, the constraints (11.6) are linear. Then it can be 
proved that the set of i.i.d. MDL behaviors thus constrained forms a polytope for all 
values of £ and h (Putz et al., 2014; Pütz and Gisin, 2016). The explicit form of the 
polytope has been studied only for the (2, 2; 2, 2) scenario; the number of facets varies 
according to the ranges of £ and h. The CHSH inequality is not a facet (although of 
course it can be studied in the context of MD, see Exercise 11.2). 

Let us focus on one of the facets, possibly the simplest to write down, as well as the 
most studied. It is defined by the inequality 


Imp = £P(0, 0, 0, 0) — A [P(0, 1, 0, 1) + P(1, 0, 1,0) + P(0,0,1,1)] <0. (11.11) 


No short proof is known that this defines a facet, but let’s prove that the inequality indeed 
holds for MDL behaviors. First, we rewrite it using (11.10): 


9 SV sources of randomness are usually presented for bits; the upper bound is then clear: € < P(bplà, 
bp_15--5 01) <1. 
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Rage J daP(A) [P(0, 0/0, 0,)P(0, 0|A) — A [P(0, 110, 1,4) P(0, 11A) 


+P(1,0]1,0,A)P(1, 0A) + PO, 0/1, 1,A)PC,1|A)]]. 


Now we can use (11.6) to bound P(0, 0/4) < Ain the first line, and P(x, y|A) > £ for the 
three terms in the second line: Then 


IMDL < th f PO [P(0,010,0,à) — P(0, 110, 1,4) — P(1, 0]1,0,A) — P(0,0|1, 1A)]. 


In the right hand side, there appears a normal Bell expression, without MD: In fact, it’s 
the Eberhard form (2.33) of CHSH. If the initial p(abxy) is MDL, that is P(a, b|x, y, A) = 
P(a|x, 4)P (bly, 4), CHSH cannot be violated and Impr < 0 holds as claimed. 

Let us now study the violation of Impr in quantum theory. By non-negativity of 
probabilities, no violation is possible if £ = 0. As soon as £ > 0, we know from Hardy’s 
test (subsection 4.4.1) that Mpz can be violated with quantum behaviors: Indeed, one 
can set P(O, 1, 0, 1) = P(1, 0, 1, 0) = PO, 0, 1, 1) = 0 while having P(0, 0, 0, 0) > 0. 
This result holds independently of 4. By noticing that £ = 0 implies the possibility of 
excluding one of the pairs of inputs, these observations can be cast in a very appealing 
narrative: Some quantum behaviors can be proved to be nonlocal as long as all the inputs 
are possible in each round, however small the probability of some of them. It was noticed 
later that, in larger Bell scenarios with M4 = Mpg = M > 2, some MDL inequalities 
are violated even for £ = 0, provided h < WE? which implies that at least 2 pairs of 


inputs must be possible in each round (Putz and Gisin, 2016). 

The systematic study of non-i.i.d. strategies has been attempted only for the (2, 2; 
2, 2) scenario under the min-entropy constraint (11.9); even then, rather partial results 
were obtained (Tan et al., 2016). One of them is worth mentioning because it shows, 
as claimed previously, that the MDL polytope changes significantly. Consider a N = 
2 min-entropy constraint for the MD process. One can run this process and extract a 
single-round effective behavior P(a, b|x, y) from the frequencies—in narrative terms, we 
can say that the verifier treats the process as being i.i.d. while it is not. For heff Z 0.255, a 
MDL strategy can give an effective behavior that violates Imp; whereas in the 1.1.d. case, 
h ~ 0.255 forces a strictly non-negative £ = 1 — 3h ~ 0.235. In summary, a criterion 
that demonstrates nonlocality for a i.i.d. MD scenario becomes inconclusive for a min- 
entropy constraint. 


11.5 Randomness Amplification 


11.5.1 DI certification and measurement dependence 


The possibility of DI certification follows from the certification of nonlocality. The latter 
is usually done under strict measurement independence and no-signaling (as we assumed 
in Part II and section 9.3). But the certification of randomness has also been the object of 
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Figure 11.2 The idea of randomness amplification. The output of a Santha-Vazirant source (11.13) is 
used to choose both the measurement settings and the state to be used in a Bell test, thus introducing clear 
measurement dependence. The goal is to prove that, even in this situation, the outputs of the Bell test can 
define a process that is arbitrarily close to a perfect coin (at least for some values of £). Thus, the Bell test 
(dashed box) acts as an unseeded extractor of a SV source, something that is impossible in information 
theory with classical resources. 


exploratory studies when nonlocality is certified under partial measurement dependence 
(Koh et al., 2012) or bounded signaling (Silman et al., 2013). 

In this section, we rather give a different twist to the relation between nonlocality 
and randomness in the presence of measurement dependence. Discussing randomness 
extraction (subsection 8.2.2) we pointed out that seeded extraction is possible for fairly 
generic sources, whereas deterministic extraction is only possible in limited cases. In 
particular, the randomness of SV sources (11.8) cannot be extracted deterministically. 
Colbeck and Renner (2012) proved that the randomness of an SV source can be extracted 
by using the original source as input of a Bell test and taking the outputs as the new source 
(Figure 11.2). This result could be presented as another example in which quantum 
information processing achieves something that is impossible in classical information 
processing. I prefer to read it as a proof that a Bell test generates genuine independent 
randomness, as needed to extract randomness from a SV source. 


11.5.2 Setting the stage 


The proof of Colbeck and Renner uses the chained inequalities that we studied in 
subsection 4.2.3. The inequality uses 2M pairs of settings, so we can assume that the 
source of randomness picks only those pairs. On average over many rounds, each pair is 
chosen with the uniform distribution 


1 
P(x; y) = Q(x, y) = ZM (11.12) 


Claire however knows that the settings are chosen with a SV source defined by 


l1— 1 
m area = 


11.1 
2M 2M ( 3) 
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where A is a shorthand notation for both the process 4 chosen in that round (this is 
measurement dependence) and the inputs of the previous rounds. 

In chapter 8, we considered randomness for someone that has the most precise 
description of the process in each round allowed by quantum theory; and we focused 
mostly on secret randomness, i.e., for an adversary outside the secure location. Here, 
the goal is different: We start with a source of randomness, and secrecy is not a direct 
concern. In this situation, von Neumann extraction would be possible if the source 
were i.i.d. (subsection 8.2.2). We are going to show how using nonlocality one can 
obtain a i.i.d. source of bits starting with the SV source (11.13). This step is called 
randomness amplification; extraction would then be completed by von Neumann’s or other 
procedures (although in this particular example the output bits are already unbiased). 

The figure of merit is the variational distance (9.12) between the actual distribution 
and the desired one. Let us set the distance between the SV source (11.13) and the 
unbiased i.i.d. source (11.12) to!® 


E 
D[Px, ya» Qx,y] < z (11.14) 


11.5.3 Derivation 


The proof of the possibility of randomness amplification uses similar tools as in 
subsection 9.4.2: A bound on a variational distance through the chained inequality. Again 
we refer to Appendix G.7 for the manipulations of the variational distance. 

We consider the chained inequality in the form (4.7) for given W: 


1 
xy 


where a, b € {0, 1} and where the prime on the sum selects only the pairs of settings that 
enter the inequality (those that the SV source can select). We can follow the proof of 
subsection 9.4.2 till (9.14), which reads here Cmw > D[P Ax=1,4>PAlx=1,4 ] that is 


Cuma Z2PPAx=1,4> QA] (11.15) 


where Q4 (a) = 5 represents the i.1.d. source with unbiased marginals. 

Now we have to relate the bound (11.15) on Cia to the observed value of Cy. This is 
not immediate because of measurement dependence: Indeed, Cu, BEE QA PAC A? 
but rather 


10 Notice that (Colbeck and Renner, 2012) defined D[Py ,y|,>Qx,y] < £, so there is a factor of two between 
our and their definition. 
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1 


Cobs =) PAA] 1, M)P(a=6|1, M, A) + XO PAI, y)Pla #b|x,y, A). (11.16) 
A XV 


By manipulating this expression: 


$ 
P(A|x, y) 
/ 
CM, obs = RADI P(a=b|1, M, A) + 3 PALM F |x, y, al 
> gml, M) >> PAIL, MCh 
A 
> 2q4m(1, M)P[P 4, ajx=1,M> QA X Pajx=1,M] (11.17) 


where for the last line we have used (11.15) together with the no-signaling condition. We 
have defined 


ee: P(A|x; y) A 2log) M 
qm(l, M) = mi PAIL M) > [A — £)/( + £)]/ “2 (11.18) 


with the minimum taken on the pairs of settings with non-zero probability, and where 
the bound follows from (11.14). 
Finally, let us insert into (11.17) the best quantum value (in this case, a mimimum) 


. 2 2, 
CM, obs = 2Msin iM = ae 
DIP Q4 xP jee (+ a (11.19) 
3 = „M> X A =1,M eee ù . 
A, A|x=1 A |x 16 Zaa 


The r.h.s. tends to zero for large M for ¢ < (V2 — 1)? ~ 0.172. In conclusion, if the initial 
bias is not too large, the process that produces Alice’s outputs for x = 1 is as close as one 
wishes to an unbiased 1.1.d. coin. 

One of the crucial elements of the proof is the fact that the quantum bound and the no- 
signaling bound of the chained inequality coincide. We know other inequalities with that 
property, e.g., the MABK inequalities for an odd number of players (section 5.2). With 
this insight, it did not take long after the Colbeck-Renner breakthrough to find a five- 
player Bell test that would allow randomness amplification for any € < 5 (Gallego et al., 
2013). Both results are very sensitive to small deviations from the perfect quantum value. 
A series of further improvements followed, reviewed in (Pivoluska and Plesch, 2014). At 
the moment of writing, possibly the best result is the randomness amplification of a SV 
source based on the inequality Impr (11.11) obtained using the entropy accumulation 
theorem (Kessler and Arnon-Friedman, 2017). 
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EXERCISES 


Exercise 11.1. Assume that h= ; and £ = 0 in (11.6). For this case of measurement 


dependence, design a LV model that reproduces the PR-behavior (9.3). 


Exercise 11.2. We study CHSH in the presence of MD. We have said that CHSH 1s not 
a facet of the MDL polytope, so it is not the most robust inequality against MD. But it has 
been the most tested one, so it 1s interesting to ask to which extent the reported observations are 
tolerant against MD. 


1. Prove that the observed correlation coefficients Exy are given by 


1 
Emn = 
9 P(x,y) 


J APOP, yE, (11.20) 


where p(x, y) = f AdAP(A)P(x,y|A). Hint: Out of N rounds, each à is used in N} = 
PON rounds on average; and E}, is probed in P(x, y|à) N, rounds on average. 
2. Consider the following MD strategy: There are four à’s denoted by ry, x',y’ € {0, 1}, 


each used with probability P(Ayy) = L. They are defined by 
POX yAxwy) =p+(1-— ADP) ôx x dy,y/ 
forO0<p< K Find that in this case the CHSH expression is given by 
Ày / 
S= DV VCD? PG rAxey Ey” - 
xy’ xy 


3. Convince yourself that the local bound of this inequality has already been studied in 
Exercise 1.4, with q = 1 — 3p [refer to (Koh et al., 2012) for the study of its violation 
in quantum theory: In particular, as long as it is not achievable with LVs, its maximal 
violation self-tests the maximally entangled state of two qubits]. 


12 
Epilogue 


Seul Pavenir nous dira de quoi le futur sera fait. 
Only future will tell what tomorrow has in stock. 


Attributed to C. Constantin 


Many popular accounts on nonlocality build on the catchy slogan “Einstein was wrong.” 
He may well have been; but prescient, he surely was: He realized that, if local hidden 
variables are not the explanation, our view of physics or of nature will be deeply affected. 
After Bell, we can only choose how it will be affected. 

We can adopt the “orthodox” view, that decrees that one should not try to describe 
individual events. Individual events are used to update one’s knowledge; when a usable 
pattern emerges, in the form of a quantum state, it enables future predictions, all of 
statistical nature. This is “just standard quantum mechanics,” as some of my colleagues 
like to say. This should not make it less shocking: individual events have no place in 
physics, the description of a round of an experiment is unspeakable. 

If we want physics to say something about those individual events, nonlocality kicks 
in fully, with influences that propagate at infinite speed in a preferred frame. Granted, 
we may change the wording (infinitely rigid ether, quantum potentials, pilot waves, 
preferred foliation ...) or the narrative (add some convenient extra dimensions, massless 
wormholes, or other science fiction constructs). But the basic fact remains: A description 
of those individual events won’t fit in the space-time setting we are familiar with. 

Some take refuge in bigger conspiracies, from incredible measurement dependence 
to full blown superdeterminism. They claim that they can live meaningful lives, and even 
have moral duties like that of promoting science, while adopting such a worldview. I 
won't be able to follow them there, but this may just mean that I have been programmed 
differently. 

One of my few certainties is that the future of physics won’t be milder, that more 
shocks are in stock for the future. Nonlocality was discovered as one of the many 
phenomena predicted through quantum theory, but it has the potentiality to acquire 
the status of principle. Will it be the solid base from which to take the next leap? 
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Appendix A 
History Museum 


This appendix covers a few of the milestones in the debate about indeterminism in quantum theory. 
The pre-history of the mathematical object “Bell inequalities” starts with George Boole’s The Laws 
of Thought of 1854; it has been reviewed in (Pitowsky, 1989). 


A.1 Heisenberg’s Uncertainty Relations and the Question 
of Indeterminism 


As we said in the main text, traditional textbooks argue in favour of intrinsic indeterminism by 
invoking Heisenberg’s uncertainty relations: The better one can predict the result of a measurement 
of position, the worse one can predict the result of a measurement of momentum. The orthodox 
interpretation of quantum theory does consider the uncertainty relations as a manifestation of 
intrinsic indeterminism. However, the cogency of the argument is limited by several considerations: 


© Ambiguous interpretation. The orthodox interpretation does not impose itself. When pres- 
sured to explain uncertainty to the general public, Heisenberg himself attributed it to 
an alleged intrinsic limitation in the measurement process. His explanation based on the 
“Heisenberg microscope” has been repeated countless times, also by authors who were 
otherwise endorsing the orthodox view on quantum indeterminism. Alternatively, one may 
also consider intrinsic fluctuations in the preparation: The source may produce a sharp pair 
(x, p) in each round, but there is something that prevents it from always producing the 
same pair. 


e Reliance on quantum theory. Let us anyway assume the orthodox interpretation, that the values 
of some variables are really not sharply defined. Without quantum theory, we wouldn’t know 
which form of unpredictability this gives rise to. The “natural” guess would be that every 
physical quantity has some intrinsic noise, but this is not what nature seems to have opted 
for. Rather, at least two physical quantities must be considered to see some intrinsic noise.! 
And it’s impossible to quantify the amount of intrinsic noise a priori: At which point would 
one have to admit “I can’t do better, I have reached the limit imposed by nature”? 


e Reliance on characterization of the devices. Let us even assume the validity of quantum theory, 
so that we know what to test and which bounds to expect. The uncertainty relations cannot 


1 Gravity may be an exception as hinted by Matvei Bronstein in the 1930s with the following argument. 
Bohr and Rosenfeld had shown that a sharp measurement of field components is possible in principle: It uses 
a probe particle such that A — œ, Where e is the charge associated to the field and m the mass. But for gravity, 
the charge associated to the field is the mass itself. 
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be tested without a very rigorous characterization of the devices: Which degree of freedom 
they measure, and how well calibrated they are. Nobody will doubt, for instance, that if two 
orientations of a Stern-Gerlach magnet are believed to be exactly orthogonal but are not, the 
uncertainty relations for orthogonal directions may be violated. At a similar level of triviality, 
if the apparatus that measures momentum is not switched on, its pointer will indicate 
p= 0 in each round, leading to estimate Ap = 0 irrespective of Ax. An unwitting combination 
of bad calibration and residual excessive noise may lead to a very tight “verification” of the 
uncertainty relations. 


Because of all this, it is not surprising that the claim of intrinsic unpredictability was relegated 
to a question of interpretation prior to Bell nonlocality. In particular, it was certainly possible 
to consider the indeterministic formalism as an effective tool, to be replaced one day by a more 
comprehensive theory that would restore determinism. 


A.2 The EPR Argument 


The article by Einstein, Podolsky, and Rosen (Einstein, Podolsky, and Rosen, 1935) is often cited 
as “EPR paradox.” I prefer to call it EPR argument, as I don’t find anything paradoxical with it— 
unless one wants to call “paradox” the contradiction between pre-determined values and quantum 
theory. 

EPR considered the state of two particles, each with its own position and momentum operators 
satisfying [x;,p;] = tħ. Since [x1 — x2, p1 + p2] = 0, one could try and define the state that satisfies 
both 


xj —xX2=d and pı +p2 =u (A.1) 


with d,u € R. As usual, the requirement of sharp position and momentum leads to unnormalizable 
wave-functions; but it is sufficient to add an arbitrary amount of spread to define a valid state. 

EPR then reasoned as follows: If the measurement of the position of particle 1 yields xı = x, 
we know for sure that the result of a measurement of position of particle 2 would be x2 = x — d. 
Similarly, if the measurement of momentum of particle 2 yields p2 = p, we know that a 
measurement of momentum on particle 1 would have certainly given pı = u — p. This is perfectly 
valid quantum theory.” But then, they asked, how can such perfect correlations be accounted for by 
the orthodox interpretation, according to which the results of the measurements do not pre-exist? 
The whole book has been devoted to this question, so we won’t dwell on it here. 

It is interesting to notice that, even if EPR had been aware of Bell inequalities and had tried 
to violate them, they were handling one of the most difficult states for the task. Surely it’s a pure 
entangled state, so a violation can be found for suitable measurements (Banaszek and Wodkiewicz, 
1998). But the state has a positive Wigner function: So, as long as both particles undergo arbitrarily 
many measurements of the type cos 6x+ sin 0p, there exist a LV model that describes all the 
statistics. In particular, the statistics of (x1, p13 x2; p2) do have a LV description, as the EPR 
reasoning shows. 


2 In particular, contrary to a widespread misunderstanding in popular accounts, the uncertainty principle 
is not violated. Indeed, given the EPR state, the distribution of x; is uniformly spread over the line, so statistics 
would show a very large variance; and the same yields for x2, p1, and p2. So Ax; Ap; is arbitrarily large for both 
jo 1,2: 
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A.3 Von Neumann's Observation on Pre-Established Values 


Von Neumann’s observation on pre-established values is part of his derivation of the Born rule 
(a derivation later superseded by Gleason’s theorem). Once extracted, the argument is ultimately 
rather simple; in fact, it’s used in several basic introductions to quantum theory. Let me phrase 
it with spins. On the one hand, a measurement of a spin > will yield + 1 (in some units) in any 
direction. On the other hand, a spin is a vector, so the component in the d= a direction should 
I, But if we assign pre-established values Sg = +1 or —1, and S = +1 or —1, 
there is no chance that S} = +1 or —1 too. 

This is an important observation on deterministic local hidden variable models: If there are 
pre-established values, their algebra can’t respect the tensorial character of the physical objects 
that they describe. As it often happens, a complex statement was transmitted as a much simpler 
slogan, namely: Local hidden variables are ruled out. It was Bell (1966) that pointed out the flaw 
in the slogan: To measure the spin in the d direction, the Stern-Gerlach magnet must be oriented 
in a direction that is macroscopically distinguishable from any other. As such, it is a completely 
different measurement, so there is nothing irrational in assuming that its output is not algebraically 
related to that of other measurements. 

We'll bring up similar topics in Appendix D dedicated to local variable models. 


satisfy Sy = 


A.4 Bell's 1964 Inequality 


The last piece of our museum is Bell’s original inequality. Equation (15) in (Bell, 1964) reads 
1+ P(b, 2) = PE, b) — PG, d| (A.2) 


where P(a,b) is a notation for the correlation coefficient (a,by) with d= âx and b= by. We see 


that Alice uses measurement directions (a, b), while Bob uses measurement directions (¢, b). In his 
derivation, Bell assumed that 


P(b, 6) = —1 (A.3) 


holds, as for the singlet state in quantum theory. Under this assumption, it is easy to check that the 
inequality (A.2) is a version of CHSH. 

Because of this assumption, Bell’s work already proved that the predictions of quantum theory 
are at odds with any local hidden variable theory. However, perfect anti-correlations are never 
observed in experiments: In order to perform a Bell test, one needs inequalities like CHSH that 
are independent of such idealizations. 


Appendix B 
Experimental Platforms: A Reading Guide 


This appendix provides a quick overview of the most important experimental implementations of 
Bell tests. We don’t go into any detail and just aim at lowering the potential barrier needed to read 
the research articles. The few cited experimental articles and the reviews (Genovese, 2005; Pan 
et al., 2012) should be sufficient to start looking for more detailed information. 


B.1 Photons 


The vast majority of Bell tests have been conducted with pairs of entangled photons. The entangle- 
ment is often in polarization, although other degrees of freedom have been explored (subsection 
B.1.2). The main challenge in conducting a proper Bell test is the efficiency of single photon 
counters: Conventional semi-conductor detectors have efficiencies in the realm 10 — 50%, too 
low to close the detection loophole even if there were no losses. The loophole was eventually 
closed using superconducting detectors, which must be operated at low temperatures (in dilution 
refrigerators). 


B.1.1 Sources 


The early experiments used an atomic cascade as source (Aspect et al., 19826; Aspect et al., 1982a). 
The mechanism is intuitively clear. The electronic state of an atom is prepared in an excited 
state that can decay through two possible paths: First emit a photon polarized left-circular then 
one polarized right-circular, or the opposite. If the first and the second transition have the same 
wavelength in both paths, the two processes are indistinguishable and the photons are entangled 
in polarization. 

The big inconvenience of this source is that the emission can occur in a very large solid angle, 
so only rarely are the photons emitted in the directions that are collected by the measurement 
devices. This is why this type of source was completely abandoned in favor of sources based on 
spontaneous parametric down-conversion. In this process, light from a laser at frequency wz impinges 
from a direction ÈL on a crystal that has non-linear optical response. The three-wave-mixing 
component x 2) can give rise to pairs of excitations that satisfy the phase-matching conditions 
Wa + Or =a ane Ra + kp = = kL. By solving these equations given the dispersion relation in the 
crystal —?— ON ir FO , the spatial modes a, b in which the photons are emitted are sharply defined 
for a given frequencies (wa, wp). Its good to keep in mind that the down-conversion efficiency 
is typically 1074 (that is, the process produces one pair of photons per 104 photons in the 
pump beam). 

Next we should discuss in which degrees of freedom can such photons be entangled. 
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B.1.2 Entangled degrees of freedom 


Most experiments with photons, including two of the loophole-free ones (Giustina et al., 2015; 
Shalm et al., 2015), used entanglement in polarization. To explain its origin, one has to have a 
closer look into the possible non-linear crystals and their polarization properties. 


e In Type-IT down-conversion, there exists a basis (H, V) such that one of the down-converted 
photons has the same polarization of the pump and the other one opposite polarization. One 
can find suitable directions such that the phenomenological Hamiltonian reads 


Hi =s (anal,b}, + avat b}) (B.1) 


where @j,y is the amplitude of the pump laser in that polarization mode and g is a coupling 
constant proportional to x) and roughly proportional to the length of the crystal. Thus, 
the first excitation of the output field is the two-photon state cos@|H),|V), +sin@|V),|H), 


with cos0 = sit A (in this elementary presentation we can assume the amplitudes to be 
agtay 


real). This was the type of entanglement used in (Weihs et al., 1998). 


e In Type-I down-conversion, there exist a polarization direction D such that pump light of 
that polarization may be converted into two photons with the orthogonal polarization DŁ. 
The down-conversion process is captured phenomenologically by the Hamiltonian 


Hr =gapal, bt. (B.2) 


Entanglement is produced by a serial arrangement of two crystals, the first aligned such 
that D = H, the second such that D = V. Then, for a pump polarized along cosOH + 
sin V, the first excitation of the output field will be the two-photon state cos@|V),|V)y5 + 
sind |H}, |H). 


As first noticed by Franson (1989), the down-conversion process itself defines another type of 
entanglement, that has been called energy-time entanglement or, in its discrete version, time-bin 
entanglement. This type of entanglement was used for instance in (Tittel et al., 1998). The idea is 
the following: The coherence of the down-conversion process is defined by the coherence time of 
the pump Tt p. However, conservation of energy requires that the two down-converted photons are 
created within a very short time t, from each other. If tp/t, = N > 1, which is usually the case, 
the output is multimode, so the phenomenological Hamiltonian (B.2) should rather read 


N 
Hy ~ ga) ato! (B.3) 
j=l 


where we omitted the mention of the polarization and opted for a discrete choice of modes 
for notational simplicity. It follows that the first excitation of the output field is the two-photon 
state ee 1 |J)alJo), Where j now refers to the time mode. This state is highly entangled, the only 
remaining challenge consists in reading that entanglement out. This can be done by unbalanced 
interferometers. Without describing the details, let me mention that the feasible implementations 
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at the moment of writing cannot close the detection loophole, because the design of the interfer- 
ometers has intrinsic losses.! 


B.2 Atomic Degrees of Freedom 


‘Trapped ions and atoms can be prepared in entangled states of their internal degrees of freedom 
(usually fine or hyperfine energy levels), after which one can observe violation of Bell inequalities. 
From the point of view of the loopholes, atomic systems are the opposite of photons: It’s easy to 
close the detection loophole but hard to close the locality loophole. 

The detection is based on driving a transition between one of the levels to be measured, 
say the ground state, and a third level. If the system was in the ground state, this process will 
generate easily detectable fluorescence. If the system was in the excited state, no light is emitted. 
Measurements in different bases can easily be effected by applying a Ramsey pulse before the 
detection (exactly the analog of rotating the polarization prior to measuring in a fixed basis). 
Because fluorescence detection is very efficient, the detection loophole was first closed with ions 
(Rowe et al., 2001). A Bell violation with entangled ions in different chambers led to the first 
demonstration of randomness generation (Pironio et al., 2010). 

In order to close the locality loophole, techniques had to be devised to entangle two very distant 
atomic systems. The obvious way to do is to swap entanglement from a pair of entangled photons. 
While this proposal is easy to understand, its implementation is challenging. The main point is 
to herald the creation of the entanglement of the atomic systems, so that one performs the Bell 
test only when entanglement is known to have been created. Indeed, as we have seen, down- 
conversion is a probabilistic process; and even when entangled photons are produced, they may be 
lost in transmission or not couple to the atomic system. The first successful demonstration was the 
loophole-free Bell test of Delft, where the atomic degrees of freedom were electronic states of NV- 
centers in diamonds (Hensen et al., 2015). Weinfurter’s group later reported the same achievement 
with hyperfine states of neutral atoms (Rosenfeld et al., 2017). 


B.3 How to Address the Detection Loophole 


As discussed in subsection 1.5.2, from the point of view of the verifier, it is trivial to close the 
detection loophole: He just needs to elicit an answer for each query. The challenge is in the 
design of the experimental setup. The detection loophole is also crucial in device-independent 
characterization, notably because it can be activated by sheer mistake—it has actually happened, 
and was duly reported (Tasca et al., 2009). 

This section describes a toy model that elucidates how one addresses the detection loophole; a 
much more detailed modeling will be required for any specific setup. 


B.3.1 Setting up the notions 
Simple theoretical estimates address the detection loophole by studying a single figure of merit, 


usually called detection efficiency and denoted n. This figure of merit actually captures what is called 


1 At some point, this obvious observation has given rise to a uselessly acrimonious discussion. I prefer to 
consign it, together with the corresponding references, to oblivion. 
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system efficiency in the experimental jargon: It’s not the efficiency of the detector alone, but it 
includes the probability that the signal from the source is not lost before reaching the detector. 
To keep the number of parameters minimal, here we assume that ņ is the same for all the detectors 
of both players. 

For definiteness, we consider a measurement setup with two conclusive outputs (“detections”) 
and a third non-conclusive output (“no-detection”). The two conclusive outputs are naturally 
going to be associated with two outputs of the Bell test. But the players must give outputs in the 
Bell test even in the event of no-detection. There are two possible options: 


e “No-detection” is treated as a third output. This leads to a Bell scenario with m4 = mp = 3. 


e “No-detection” is associated to one of the two outputs according to a pre-established recipe 
(often a deterministic assignment” ). This leads to a Bell scenario with m4 = mp = 2. 


In all the examples that have been reported, these two options lead to similar, when not identical, 
thresholds for closing the detection loophole. 

From now on, we focus on My = Mpg = 2. The thresholds are comparable for other Bell 
scenarios with small number of inputs and outputs; for significantly reduced thresholds, it seems 
that one would have to go to unrealistic implementations. I refer to subsection VII.B.1.c of 
(Brunner et al., 2014) for detailed references, and to (Pal and Vértesi, 2015) for further technical 
results. This is why all the experiments that closed the detection loophole performed to date have 
used the CHSH inequality. 


B.3.2 Efficiency for the (2, 2; 2, 2) scenario 


Suppose that Alice and Bob share a perfect maximally entangled state of two qubits and perform 
the measurements that saturate the Tsirelson bound. When both particles are detected, which 
happens with probability 77, their data will show S = 2/2. When only one of the particles is 
detected and not the other, the data will show no correlations, because one of the outputs is the 
result of a measurement on half of a maximally entangled state, which is correlated only to its twin. 
So, a fraction 2n(1 — 7) of the data will contribute with S = 0. Finally, when neither particle is 
detected, the most one can hope for is the local bound S = 2. This is easy to achieve: The players 
may just decide to output deterministically +1 in case of no-detection. Thus, the total statistics 
will give 


SYT, n) = 22/2 + (1 — 9)?2 (B.4) 
and the condition S(W~, 7) > 2 for violation translates into the threshold efficiency 


2 
n> Nh (Y) = Z4 x 82.8%. (B.5) 


For good measure let us redo the estimate of the threshold efficiency with the CH form (2.30) of 
the CHSH inequality (as mentioned in subsection 2.5.2, this was the original calculation). Since 
this inequality only involves the outputs a = b = 0, one can group the no-detection events with 


2 In this case, in setups where each output is associated to a detector (as is the case for photons), one can 
have only one detector per player: Whenever it does not click, the other output is recorded. 
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the events a = 1 on Alice and b = 1 on Bob; but the number of such events must be counted 
in the denominator when computing frequencies. Then one would observe P(0, 0|x,.y) = n? za + 
(-1)//2) and P(a = O|x = 0) = P(b = Oly = 0) = n/2, leading to Soy = n 2H — n. As 
expected, the threshold efficiency for violation Scy > 0 is the same (B.5). 


B.3.3 A surprising optimization 


As we have seen, assuming that the players share a maximally entangled state leads to the maximal 
value of S by the two-detection events, but implies a contribution S = 0 by the one-detection 
events. Surprisingly, Eberhard (1993) found that it is beneficial to trade a decrease in the first for 
an increase in the second. We present this result using the usual form of CHSH, although of course 
in the original paper it was found using the Eberhard form (2.33). 

Consider non-maximally entangled pure two-qubit states (3.15) and leave the measurement 
directions free to start with. In a similar way as previously mentioned, the value of the CHSH 
parameter takes the form 


SCOLO), n) = So +200 — n)S1 + (1 — n)?2 (B.6) 


where the values of the S; depend on the measurement directions. The threshold efficiency for 
every 6 is then given by 


0) = i et B.7 
mi(@)= min Fes B.7) 


40,41, 60561 
The numerical optimization shows that n,,(0) decreases monotonically with 6, from (B.5) for 
0 = 2/4 down to n7,(0 > 0) = 2/3. Besides, for 0 < 2/4, the measurement settings that minimize 
the threshold efficiency are not those that give the maximal value (3.17) of S2. Thus, we reach the 
conclusion that the detection loophole for CHSH is best closed with weakly entangled states? and 
for measurements that do not maximize the contribution of the two-detection events. 
The two loophole-free Bell tests conducted with photons used the Eberhard trick (Giustina 
et al., 2015; Shalm et al., 2015), whereas in the one conducted with NV-centers the efficiencies 
were high enough to detect the violation by a maximally entangled state (Hensen et al., 2015). 


3 It must be stressed that we are speaking of pure states, not of any weakly entangled state. In practice, if @ is 
too small, a low amount of noise will be enough to prevent any violation. 


Appendix C 
Notions of Quantum Theory 
Used in this Book 


Undergraduate quantum physics is taken for granted, including the notation, the usage and the 
meaning of those mathematical objects (e.g., bras and kets, Hermitian operators, unitary operators, 
projectors ...). This appendix lists the non-elementary notions of quantum kinematics that are 
used in the main text. For more comprehensive definitions and all the proofs, readers can refer to! 
(Nielsen and Chuang 2000; Schumacher and Westmoreland 2010; Watrous 2018). 


C.1 States and Measurements 


C.1.1 Definition of quantum theory 


The content of this subsection is among the basic material that is supposed to be known. It 
is sketched here because it is directly related to some discussions in the main text, notably 
chapter 10. 

Quantum theory is defined by the fact that physical systems are described by vector spaces. 
Barring foundational discussions, these vector spaces are complex. The finite-dimensional ones 
are all isomorphic and denoted C4 where d is the dimension. Only one infinite-dimensional vector 
space is used in quantum theory, the space of square-integrable functions L?(IR). Practitioners 
refer to all these vector spaces as Hilbert spaces, borrowing the name of the infinite-dimensional 
case. The state of a composite system is described by the tensor product of the Hilbert spaces of 
the subsystems. 

Physical properties are represented by subspaces of the vector space, with the rule that distin- 
guishable physical properties are associated to orthogonal subspaces. A pure state is a state of 
maximal knowledge, so it corresponds to the smallest non-trivial subspace: A one-dimensional 
ray. This ray can be unambiguously represented by the corresponding projector. Alternatively, 
one can represent it with a vector |y) lying on it, keeping in mind that any vector that differs by a 
constant represents the same state. A state p that does not describe maximal knowledge is called 
mixed state because it can always be represented as a convex sum of projectors on pure states. 


1 At the time of writing, Watrous’ book is available online: https://cs.uwaterloo.ca/ watrous/TQI/. Another 
classic resource, to date unpublished, are Preskill’s lecture notes available at http://www.theory.caltech.edu/ 
people/preskill/ph229/. 

Mathematicians did not find it necessary to give a name to such a trivial object as a finite-dimensional 
vector space. Also, to be precise, the infinite-dimensional vector space is a rigged Hilbert space. 
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By contrast, in a classical theory, physical systems are represented by sets, properties by subsets, 
pure states by points. In particular, given two properties that are not mutually exclusive, one can 
always find states that possess both properties with certainty: The points that lie in the intersection 
of the corresponding sets. By contrast, the intersection of two non-orthogonal vector subspaces 
may be just one point: Thus, in quantum theory, given two properties, there may be no state that 
possesses both with certainty. 


C.1.2 Tomography 


The state p describes all the properties of the system. Everyone knows that the expectation value 

of any operator can be computed given the state. But the converse is also true: A sufficiently large 

set of expectation values must determine the state (otherwise, there would be information in p that 

no operator can capture). From the perspective of probability theory, tomography is the analog of 

reconstructing a probability distribution by sampling it—only here, one must reconstruct several 

distributions in order to reconstruct the state, because there exist incompatible measurements. 
For instance, the state of one gubit can always be written 


1 
p=5d+i-d) (C1) 


with |m| < 1 and where o are the Pauli matrices. So the state is fully known if the Bloch vector is. 
Now, the component of the Bloch vector along direction 7 is ñ- m = Tr [oâ . 3)] = (n-o). Thus, 
in order to reconstruct p, it is enough to compute (oz), (05) and (o3). 

Similarly, the state of two qubits can always be written as 


1 asa S By ig 
PSG I+myg:o@1+]1@mp-o+ > Tjo ® 0; (C.2) 
G=KI 2 


and so it can be reconstructed by reconstructing the local Bloch vectors and the [o;i ® oj). 


C.1.3 Composite systems: Partial traces and purification 


Given the state of a composite system, one can always infer the state of each of its subsystems. 
Let’s consider bipartite systems for simplicity of notation: Given the state p48, we want to know 
which state p4 describes the observations made by Alice. In other words, we want 


Tr(e4A) = Trl[eap(A 8 Ip)] (C.3) 


for all operators A on Alice’s system. Since Ig = | 85) (Bal where the {|6,)} are an arbitrary 
basis of Bob’s Hilbert space, one finds that the partial state is computed by a partial trace: 


pa = Trg(paB). (C.4) 


States of composite systems are called product states if p4B = PA ® pp. All the other states show 
correlations. Since the trace is basis-invariant, the kinematic correlations in quantum states are 
no-signaling: Whatever Bob does on his system, Alice will see the same p4. 
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There is a complementary view to what we have just seen that is often useful. Consider a single 
system in a mixed state p: This state can be seen as the partial state of a composite system, and the 
composite system could be in a pure state. The construction of this purification is simple starting 
from the diagonal representation of p: If p = )°;p;|Wi) (Wil, we have p = Tra(|W) (Y|) with 


IY) =>) VBil vi) @ lb: (C.5) 


where the |8;) are mutually orthogonal. It can be proved that all purifications of a given state p 
are equivalent up to a local unitary on the purifying system (in other words, the only difference is 
which set of |8;) is chosen). 


C.2 Elementary Entanglement Theory 


Entanglement theory was mostly developed in the late 1990s and the early decade of the twenty- 
first century. The Horodecki family made a commendable effort to compile the massive amount of 
knowledge generated during those days. Their review article (Horodecki et al., 2009) is not what 
I would call pleasant reading, but is a unique resource for its comprehensiveness. 

In this elementary survey, we give the definition of entanglement for an arbitrary number of 
subsystems, but focus on bipartite system for all the subsequent considerations. 


C.2.1 Entangled states 


A pure state of a composite system 8 Hn is called entangled if it cannot be written as a product: 


N 
IY) AR) ln). (C.6) 
n=1 


For bipartite pure states, there exist a canonical representation called Schmidt decomposition: There 
always exist two orthonormal bases tla) |J= 1,...,d4} and {|Bz) |k = 1,...,dp} such that 


min(d4, dg) 


= J. vei lai) glg) (C.7) 


i=1 
with p; > 0 for all 7 and )°,;p; = 1. For multipartite pure states, several generalizations have been 


studied, see e.g., (Acin et al., 2001) for three qubits. 
A mixed state is called entangled if there exist no convex decomposition over product states: 


N 
P#Ý Pm (8 Pan) (C.8) 
m n=1 


with pm > 0 and È` „pm = 1. A non-entangled state is called separable; often, a non-product 
separable state is called classically-correlated state. 
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While it is trivial to verify if a pure state is entangled or not, for a generic mixed state no algorithm 
is known. One can prove that a state is separable by exhibiting an explicit decomposition. As for 
proving that a state is entangled, we know only some sufficient criteria. The most used one is the 
negativity of the partial transpose. Let us illustrate it for two-qubit Werner states. The state reads 
(3.23) 


1+W 1-W 
g NAOT POO ere 


W 
— z (00 (10| + 110) (01]). 


pw = (100) (00| +111) (111) 


If we transpose Bob’s side of the matrix, we get 


1+W 1-W 
pw = i (101) (01| +110) (10) + ri 


(00) (00| + |11) (111) 


wW 
— -5 ({00)(11] + |11) (00). 


The smallest eigenvalue of oe is ła — 3 W), which is negative for W > 2. Thus, every Werner 
state with W > 1 is entangled. 

For systems composed of two qubits, or of one qubit and one qutrit, the negativity of partial 
transpose is also necessary for entanglement. For any other composite systems, there exist mixed 
entangled states with positive partial transpose (PPT). These are usually called bound entangled 
states for a reason that will be mentioned in subsection C.2.3. 


C.2.2 Witnessing entanglement 


One may not have to reconstruct the full state before assessing whether entanglement is present. 
An entanglement witness is an operator W such that Tr(oW) > wsep for all separable states, while 
Tr(o W) < wsep for some entangled states. The analogy with Bell inequalities is evident: In fact, Bell 
operators are the only witnesses that detect entanglement in a device-independent way. 

Here comes an example of an entanglement witness that is not a Bell operator. Consider two 
qubits. Given a product state 71 +m,4-o)®(1+mp-c), it is readily verified that (og ® og) + 
(o5 ® 05) + (og Q oz) = Mma : Mpg < 1. Separable states are convex sums of product states: Therefore, 
for 


W = 03 @ og +05 @ 05 +03 Boz (C.9) 


it holds —1 < Tr(oW) < 1 for all separable states. The singlet state has Tr(o W) = —3: Thus, W 
detects its entanglement; and it is a simple exercise to change the signs in the expression of W to 
detect the the entanglement of other Bell states. 

Now, how would W look like in a device-independent description? If the measurements are 
uncharacterized, all that we know is that there are three dichotomic measurements per player: So 
we'd have W = Ao Bo + A; By + 42B2, which can any value between —3 and +3 for LV. Thus, for 
W to be an entanglement witness, characterization is crucial: The system must be a qubit and the 
operators must be mutually unbiased Pauli matrices. For an example in which the operators can 
be uncharacterized, but one still needs the qubit assumption, see Appendix F1.3. 
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C.2.3 Entanglement as a resource 


The explosion in the study of entanglement theory was triggered by the realization that entan- 
glement is a resource and the subsequent effort to quantify this resource. This quantification was 
phrased in terms of interconversion of resources. One first notices that entanglement cannot 
increase under local operations (LO), nor if classical communication (CC) among the players 
is allowed. Thus, one can ask: Having N copies of a given entangled state p, how many copies M 
of another entangled state o/ can I obtain by a processing that involves only LOCC? 

In particular, taking the singlet state of two qubits as unit of entanglement, two basic tasks and the 
corresponding amount of entanglement can be defined: 


e Formation: Having shared N, two-qubit singlets, one wants to create M copies of a 
desired entangled state p. The entanglement of formation of p is then defined as 
Er(p) = limy-+oo ey Any entangled state has Ep (p) > 0. 


e Distillation: Given N copies of an entangled state p, one wants to extract M, two-qubit 
singlets. The entanglement of distillation (or distillable entanglement) of p is then defined as 
Ep(p) = limN> v “™ A state for which Ep(p) > 0 is called distillable. 


For pure states, all quantifiers of bipartite entanglement coincide: E = S(p4) = S(pp) where S 
is von Neumann entropy. In particular, the maximally entangled states of two qudits is one whose 
reduced states are p4 = pp = Ma. By Schmidt decomposition, there exist a local basis in which 
these states read 


d 
1 
VME = —= > 1k) @ |k). C.10 
IY) ME Vat" IR) (C.10) 


For mixed states, some cases aside, Murphy’s law kicks in and everything that could go wrong does 
it. On the one hand, one does not have simple computable formulas because the limits cannot be 
resolved analytically. On the other hand, various quantifiers of entanglement do not coincide. The 
most striking example is the existence of states for which Ep (p) > 0 but Ep(p) = 0: Entanglement 
must be invested to create them, but no entanglement can be retrieved out of them. These non- 
distillable entangled states have been called bound-entangled. All PPT entangled states are bound- 
entangled;? whether some states with negative partial transpose (NPT) are also bound entangled 
remains open in spite of considerable effort. 


C.3 Generalized Measurements or POVMs 


C.3.1 Definition 


The reader must be familiar with projective measurements, in which each output a is associated to 
a subspace of the Hilbert space of the system being measured, and therefore P(a) = Tr(pIq) 


3 Indeed, it’s not difficult to verify that LOCC cannot change PPT into NPT. But the singlet is obviously 
NPT: Therefore, it will be impossible to distill even one singlet from any number of copies of a PPT state under 
LOCC. 


172 Notions of Quantum Theory Used in this Book 


where IT, is the projector on that subspace. There are two ways in which people have thought of 
generalizing this notion: 


e At the algebraic level, one can think that a recipe like P(a) = Tr(peI]a) produces valid 
probabilities for every state as long as Ia > 0 for all a and }°, Ma = I. A set of orthogonal 
projectors satisfy these relations, but in addition they satisfy 2 = Ia: This may not have to 
be enforced. This mathematical generalization explains why generalized measurements have 
been given the unfortunate, but by now standard, name of posttive-operator valued measures 
(POVMs). 


e Atthe physical level: Upon receiving the system to be measured, one can append an auxiliary 
system‘ prepared in a given state, then perform a projective measurement on the joint system. 
Because the auxiliary system is in a given state known by the user, this procedure can be 
legitimately called a measurement on the system to be measured, since information gain will 
be only about that one. 


‘Two definitions define the same class of measurements, an equivalence result known as Naymark’s 
theorem: Any POVM can be seen as a projective measurement on an enlarged system; and given 
a projective measurement on an enlarged system, its effective description on the system to be 
measured is a POVM. 


C.3.2 A semantic subtlety 


In the text, we have often introduced operators that are linear combinations of POVM elements. 
For instance, in section 3.2 we have operators of the form A = 1,1 —I_1. These are convenient 
because Tr(pA) gives immediately the marginal bias P(a = +1) — P(a = —1). 

In a frequent abuse of language, one often takes these operators as the real thing and says “Alice 
measures A,” certainly inspired by the very frequent quantum phrase “Alice measures oz.” As 
usual, imprecise language is fine as long as one knows its limits. Here I want to point out that the 
eigenvalues of A are {—1, +1} only if the measurement is projective. Also, we need to recall that the 
labeling of the POVM outcomes is arbitrary: The same A could have been written A = Mo — Th, 
and then Tr(pA) = P(a = 0) — P(a = 1), while 0 and 1 have nothing to do with the eigenvalues 
of A even in the case of projective measurements. 


C.3.3 POVMs and joint measurability 


While examples of relevant POVMs are readily found in the literature, one issue that is usually 
tackled only by specialists is that of joint measurability. Since it was mentioned in the text, we 
have to introduce it. A widespread misunderstanding of quantum measurement theory alleges 
that one cannot measure position and momentum “at the same time.” This is nonsense: One 
can obviously construct a box that outputs both a value for position and one for momentum: 
Only, the information that can be extracted will be less precise than what could be achieved by 


4 Auxiliary systems have been called ancillae for long time in the literature. The word means servant maids 
in Latin, and triggered some backlash recently. 
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measuring either position or momentum Ž It is in this sense that position and momentum are not 
jointly measurable. 

In general, projective measurements are jointly measurable if and only if they commute. But 
joint measurability is not so trivial for POVMs, to the point that a complete set of criteria for 
joint measurability is not known. For our goals, let us just look at the simplest example. Consider 
the two POVMS M, = (Ti = s(t noz)}; M2 = mÊ = iqa + nox)}. Obviously, H elements 
of Mı do not commute with hate of Mz as soon as n > 0. Nonetheless, for n < ~ they are 


ae 


jointly measurable. Indeed, consider the four-output POVM defined by 


1 
Tas e), n= 


1 = 
Ty =- (1+ =), M= 


It holds mP = Ma, + + Ma,- and m? = M4 a + I,a for n = vA For smaller values of ņ, one can 
just add noise to the four-output apparatus. In other words, one can have a single measurement 


device with four outputs, such that a clever grouping of the outputs reproduces Mı and M2 
exactly. 


5 Here is a very simple construction: Hide in the measurement device a box that measures position and one 
that measures momentum; in each round, the device tosses an unbiased coin and routes the physical system to 
one of the boxes, while instructing the other box to output something. It does not look optimal and indeed it is 
not; but it does extract some information about position and momentum, and is thus a legitimate measurement 
device for both. There exist bounds that quantify the loss of information independently of the constructions, 
usually going under the name of Arthurs-Kelly uncertainty relations. 


Appendix D 
LV Models for Single Systems 


Knowing that LV models fail to describe observed phenomena, it may look pointless to scrutinize 
them in further detail. Still, on the side of foundations of physics, a sizeable amount of work has 
been devoted precisely to that. The reason is that, from the perspective of interpreting quantum 
theory, Bell nonlocality is not fully satisfactory, insofar as it treats single systems in a trivial way. It 
is more than fair to argue that non-classicality should be captured by a notion that applies to single 
systems too. 

In this appendix, I mainly review notions that lead to the Kochen-Specker version of contex- 
tuality and its recent improvements, because these are the notions that are most closely related to 
nonlocality. A text more foundational than mine should devote a sizeable section to the discussion 
of so-called y-epistemic models and their restrictions, triggered by the result of Pusey, Barrett, and 
Rudolph (2012). For these developments, I direct the reader to the masterful review by Matthew 
Leifer (2014). 


D.1 Overview: Looking into Measurements 


Let us start from Fine’s theorem (subsection 2.3.3): A behavior is local if and only if there exist a 
joint probability distribution for the outputs. For a single system, this is clearly the case: For every 
input, one can specify the outputs, so one could just take P(aj,a2,...;a@m) = BiS P(a|x). But in 
physics, to the input x one associates a measurement M* = {M*|a € A}. The paths to discovering 
quantum effects even in single system pass through looking more closely into those measurements (at 
the price of losing the device-independence that is the power of Bell nonlocality). 

The obvious path is the one through the full characterization of the degrees of freedom that 
are being measured. If I know that what is being measured is a magnetic moment, and that 
the apparatus is a gradient of magnetic field, then no classical magnetic moment will behave as 
observed in the Stern-Gerlach experiment. If I know that homodyne measurements are being 
performed on optical fields, then no classical field can yield a negative Wigner function. The list 
can obviously continue to include all “quantum effects” reported in the last century, and those 
which mostly concern single systems without any hint of Bell nonlocality. We refer to your favorite 
list of such phenomena for details. 

But one can see issues with LV descriptions with a much weaker level of characterization. Each 
of the M% represents a physical property. Suppose now that two different measurements MEF M* 
are guaranteed to share one physical property, say Mt = ME. A classical theory would then request 
that ag = 1 if and only if ay = 2: Either the system possesses that property, or it doesn’t, prior 
to the measurement and irrespective of which other properties are tested alongside with it (the 
“context”). Kochen and Specker (1967) proved that, if one takes enough many measurements 
that share some common properties, it is impossible to assign outputs according to the classical 
recipe. In other words, a LV model can assign a determined output to every measurement taken as 
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a whole, but can’t assign pre-existing physical properties. This is the starting point of contextuality. 
Before delving more into it (section D.3), we review the only physical system for which the KS 
theorem does not apply. 


D.2 Bell's Local Hidden Variable Model for One Qubit 


A LV model that reproduces exactly the quantum statistics of projective measurements on a single 
qubit was provided by Bell (1966). If the qubit is in the state ia +m- o), the quantum prediction 
for measurement along direction @ is 


1 
P(ala) = 7 (1 + am- â), that is (a) = m. â (D.1) 


for a € {—1, +1}. In the LV model, the state of the system is represented by (M, d) where A is a unit 
vector. The output is deterministically computed to be 


aĝ) = sign [oà SIN âl (D.2) 


which is either +1 or —1 as it should. 
Assuming that à is drawn from the uniform distribution on the sphere, i.e., p(A)dA = 
E sin d6 dg, it is easy to prove! that 


laja = | dipGyady =m-a (D.3) 


which matches the quantum prediction. Notice that nothing in this model requires i to be drawn 
“at random” in each round: The sequence of x may be pre-registered. 

Bell’s model has all the desired features of a LV model. It is not “contextual” for a simple reason: 
The property m can only be measured together with —m, and thus there is only one “context.” 
Even so, one can find it unpleasant for its “ontological excess baggage” (Hardy, 2004): While each 
measurement can extract at most one bit, the model describes the system by a vector i.e., with an 
infinite amount of information—and it was proved that one cannot get away with less (Montina, 
2006; Dakić et al., 2008). 


D.3 Contextuality 


D.3.1 History at a glance 


The mathematical definition of Bell nonlocality is peacefully accepted by everyone. By contrast, 
one finds various formal definitions of contextuality. At the moment of writing, two main threads 
can be identified.” 


1 Without loss of generality, one can choose 4 = £ = (6 = 0,9), since nothing else in the problem specifies 
the choice of spherical coordinates. 

2 Itis possible to present these two threads in a unified mathematical framework (Blass and Gurevich, 2017); 
of course, this doesn’t mean that they are identical. 
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The first thread stems from the pioneering work of Kochen and Specker (KS) (1967). As 
sketched previously, it is an attempt of attributing truth values to physical properties, reaching the 
conclusion that this is not possible. Over the years, the argument underwent several simplifications, 
but retained assumptions that are considered problematic: It works only for deterministic hidden 
variables, it requires importing structures typical of quantum theory, and it is based on the 
possibility of performing measurements that are “ideal” under several respects. In 2008 Klyachko, 
Can, Binicioglu, and Shumovsky (KCBS) managed to remove the assumptions of determinism 
and the reference to quantum structures in KS-type contextuality (Klyachko et al., 2008), leading 
to a revival of results. 

The second thread started a few years before the KCBS breakthrough, when Spekkens (2005) 
proposed a radically new approach to address non-classicality of single systems, which he also 
called contextuality. His approach relies on the formalism of ontological models, that have been the 
object of several works in foundations (Leifer, 2014). Instead of trying and attributing truth values 
to properties for any single measurement, Spekkens reasons on the structure of theory: He asks that 
two preparations that give the same statistics for all measurements should be identified (““prepara- 
tion non-contextuality”’), and that two measurements that give the same statistics for all prepara- 
tions should be identified (“measurement non-contextuality”). This approach gets away from the 
need of defining ideal measurements, although the experimental difficulty is only displaced: One 
must now be convinced that two preparations (or two measurements) are identical. Intriguingly, 
measurement contextuality can be demonstrated also for one qubit. This is welcome, insofar as we 
all believe that a qubit is a quantum system too; but it shows how deep into theoretical structures 
this definition must delve, in order to find flaws even in the LV model presented in section D.2. 

The Spekkens’ approach definitely belongs to the important advances in quantum foundations, 
but it is too far from Bell nonlocality to be discussed in this appendix. In the remainder of this 
section, we shall only introduce the KS-type approaches, starting from KCBS. 


D.3.2 The KCBS inequality 


We present the KCBS construction. In a device-independent perspective, it consists of a single 
device with five inputs and four outputs. In order to uncover non-classicality, we need to add some 
structure. We assume that each input calls two processes, each with binary output. Specifically, we 
consider five processes Ak, k € {1, 2, 3, 4, 5}, and the inputs call the pairs {(1, 2), (2, 3), (3, 4), 
(4, 5), (5, 1)}. The output of the process A; is denoted a; € {—1, +1}; the output of the device to 
the input (, k) is therefore the pair (aj, az). The need for “ideal” measurements manifests itself as 
follows: The process A; must be the same, whether it is called together with A;_1 or with A;+1. 

At this point, we proceed as we did when we introduced CHSH (section 1.3): We assume that 
all (a1, a2, a3, a4, as) have a well-defined value. Then it is easy to prove that 


ajaz + a2a3 +.a3a4 + a4a5 + asa, > —3. (D.4) 
This leads to the corresponding inequality for the observed average values. KCBS proved that 


there exist a set of five binary observables on a qutrit that commute pairwise, and such that the 
inequality is violated for some quantum states. 


D.3.3 Before and after KCBS 


Having understood KCBS, we can appreciate the difference with the original KS proof, as well as 
some later improvements. 
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The original KS proof relied on the fact that projective measurements on qutrits are defined by 
three orthogonal projectors. In each such triple, a non-contextual LV model would assign one 1 (for 
the property that the system possesses) and two 0’s. The KS tour de force consisted in identifying 
132 such triples, involving 117 projectors, such that no such assignment can be made: There will 
always be a triple that has either three 0’s or more than one 1. The geometry imposed by the 
structure of quantum theory plays a crucial role, and the argument fails if one does not request 
the LV model to be deterministic. Both limitations are absent in the KCBS argument: Only the 
notion of joint measurability is retained, and an analog of Fine’s theorem can be proved to connect 
deterministic and non-deterministic LVs. 

One very striking feature of the original KS proof is that it is state-independent: The contradiction 
is due to the quantum structure of the measurements, irrespective of any description of the state. 
This is very different from what happens in Bell nonlocality: There, LV models can always be 
constructed, it’s only some observed statistics (some states) that are incompatible with them. The 
KCBS inequality is state-dependent, but it took only a few months for Cabello to find a similar 
inequality that is state-independent (Cabello, 2008). A few years later, Yu and Oh (2012) found a 
state-independent criterion that requires only 13 projectors; and this was proved to be the smallest 
possible number (Cabello et al., 2016). 


D.3.4 Contextuality and nonlocality: A comparison 


We conclude this appendix with a comparison between contextuality and nonlocality. It is often 
said that contextuality is “more general”: My experience in science is that any statement containing 
this g-word needs qualification. 

Here are the similarities: 


e In terms of mathematical description, one can indeed say that nonlocality is a special case of 
contextuality. Both rely on versions of Fine’s theorem and the corresponding possibility of 
enclosing the classical options in a polytope. Within quantum theory, a scenario is defined 
by listing which operators commute (see the definition Q’ of the quantum set in subsection 
6.2.2). One of the most explicit effort to show the common structure of nonlocality and 
contextuality is the work of Cabello, Severini, and Winter (2010, 2014). 


e In terms of physical principles, as we have mentioned in section 10.4, the physical principle 
that was called Local Orthogonality for nonlocality has its exact counterpart for contextuality, 
where it was called Consistent Exclusivity (and where it seems to be actually more powerful 
in singling out quantum behaviors). 


However, there are also important differences. 


e Within quantum theory, contextuality can be made state-independent, while nonlocality 
can’t. Also, the commutation relations for nonlocality are imposed by the locality condition 
(and in this sense can be called “device-independent”). By contrast, in a generic scenario 
of contextuality, and notably in single-particle scenarios, the commutation relations must be 
justified by some characterization of the devices. 


e Ata more operational level, it is interesting to compare the resources needed for the classical 
simulation of such non-classical behaviors. The classical simulation of Bell nonlocality 
requires communication, as abundantly discussed in this book. For some time, it was believed 
that the simulation of contextuality would be trivial, but recently it was noticed that the 
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simulation of some contextuality scenarios require memory (Tavakoli and Cabello, 2018). 
Now, one can try a unifying narrative by noticing that communication is “transmission 
of information across space,” memory is “transmission of information across time.” But 
our current understanding of physics breaks this symmetry. Because communication faster 
than light is impossible, one can exclude communication by arranging space-like separated 
events. By contrast, memory is only limited by storage capacity (the amount of information 
that can be stored and the lifetime of the storage). While surely finite, these capacities are 
very large and constantly being improved: Only very complex contextuality criteria may 
possibly challenge the current state-of-the-art. Notice that one could base nonlocality on 
similar considerations: One could believe that communication may happen at any speed, 
then base the evidence of nonlocality on Bell tests different than those we described, whose 
simulation involves very high commmunication complexity. While these definitely exist on 
paper (subsection 11.2.2), none is practically feasible. 

Let me finish with a more subjective argument for the difference between communication 
and memory. Devices do not communicate spontaneously, they must have been engineered 
on purpose; by contrast, a device may retain information about its past even without being 
purposefully designed for it. Similarly, we could easily believe that a physical system may 
retain some memory of its past states; it’s far less easy to believe that two devices commu- 
nicate if we don’t see (with our eyes, or with our knowledge of physics) a communication 
channel. 


Appendix E 
Basic Notions of Convex Optimization 


This appendix begins with a very concise presentation of the main notions of convex optimization. 
This has been distilled from (Boyd and Vandenberghe, 2004), keeping the same notations so that 
the readers can easily continue their study with that book. Two examples introduced in the main 
text are then worked out explicitly. 


E.1 Generalities 


E.1.1 Convex programs and their Lagrange dual 
A convex program is an optimization that can be cast in the following form: 


p* = minfo(x) 
subject to f(x) <0,1=1,...,m (E.1) 
h(x) =0,i=1,...,p 


where x € R” are the variables, the objective function fo and the functions f; are convex functions 
with value in R, and the h; are affine functions with value in R. A point x in the intersection D of 
the domains of these functions is called feasible if it satisfies the constraints. If no point satisfies the 
constraints, the program is infeasible and by convention one sets p* = +00. 

Being an optimization under constraints, it is natural to approach it with the techniques of 
Lagrange multipliers. The Lagrangian is defined as the function D x R x R? — R 


m p 
L(x, As v) =folx) Y Afi +) vihi). (E.2) 
i=1 i=1 
From this, one defines the Lagrange dual function 


g(a, v) = inf L(x, A, v). (E.3) 
xeD 


Because of the convexity assumptions in the primal and the positivity of A, g(A, v) is concave and 
satisfies g(A, v) < p . The Lagrange dual program of the primal (E.T) is 


d= sup g(da,v). (E.4) 
AER, veR? 


It is also a convex problem, since it can be rewritten as the minimization of the convex 
function —g. Notice that this problem is formally unconstrained; however, natural constraints may 
appear, as we shall see in the next example of linear programs. 
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Clearly, d* < p . When equality holds, one says that the convex program exhibits strong duality; 
when it does not hold, the quantity p — d’ is called duality gap. 

As mentioned in the main text, the possibility of writing a dual program allows to compute both 
an upper and a lower bound for the quantity of interest (typically the solution of the primal). Let 
us make this statement more precise. If both the primal and the dual are feasible, for any feasible 
xf of the primal and any feasible + = (A, v)¢ of the dual it holds 


g(és) < d* < p* < fol). (E.5) 


If in addition strong duality holds, the algorithm usually converges to these bounds being the same 
within numerical precision. 


E.1.2 Special case: Linear programs 


A convex program is a linear program if all the functions fo and f; are affine. The objective function 
can then be taken as linear, since a constant can always be added after the optimization. The very 


frequent case f;(x) = —x; defines the standard form of a linear program 
p*=min,c! x 
subjectto x>0 . (E.6) 
Ax=b 


A typical example is the optimization over a stochastic vector, x > 0 imposing the non-negativity 
of probabilities. The membership problem for the local polytope (2.22) can be cast in this form, 
see subsection E.2.1. 

Let us find the Lagrange dual of (E.6). The Lagrangian is 


L(x, àv) =c1x— AT x +l (Ax b) = vTb+(c A+ Al vy? x, 


The Lagrange dual function is the infimum over x, which is —oo unless c — A + A’ v = 0. We 
can treat this condition as a constraint for the dual, since it is a necessary condition to obtain a 
non-trivial bound; and we can rewrite it as A! v + c > 0 since à > 0. In summary, the Lagrange 
dual program of (E.6) is 


d* = max, —b! v 


subjectto ATv+c> 0. (E.7) 


E.1.3 Special case: Semidefinite programs (SDPs) 


A semidefinite program differs from a linear program in that the affine inequality constraints read 
X; xiF; < 0 where F; € R*** are symmetric matrices of some size (the case of linear program is 
recovered for k = 1). So the primal is given by 
p*=min,c! x 
subject to 0, x;F; <0. (E.8) 
Ax = b 
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The Lagrangian is similar as before, but now the Lagrange multiplier of the semi-definite constraint 
is a matrix Z > 0: 


L(x,a,v) = cx Tr(Z > x;F;) + vl (Ax — b) 


1 


= -vb + X (c; + Tr(ZF;) + (A7v)i)xi. 


1 


The same reasoning as previously demonstrated implies that the Lagrange dual is 


d* = max, z —bTv 
subjectto c; + Tr(ZF) + VA = 0 for allz (E.9) 
Z>0 


E.2 Two Explicit Examples 


E.2.1 The membership problem for the local polytope 


As an example of linear program, we look at the membership problem for the local polytope (2.22). 
The variable is x = q € R®P the vector of weights of the extremal points of the local polytope. In 
the formulation of that problem in the main text, there is no objective function to minimize. One 
could be tempted to minimize |P — ue GP Lp,i|, Which is not linear but there exist tricks to write 
a linear program, see Eqs (1.6) and (1.7) of (Boyd and Vandenberghe, 2004). But the standard 
solution is simpler: One takes all the conditions (2.22) as constraints and sets fo(x) = 0. The 
outcome will then be p = 0 if the program is feasible, p* = +00 if it is not—and this is exactly 
what we want to know: Whether the constraints can be satisfied or not. 

Because the only inequality constraint is the non-negativity of probabilities x > 0, and all the 
equality constraints are affine, this problem can be cast as a linear program in the standard form 
(E.6). Clearly, c= 0 € R#2. To identify A and b, recall that a behavior can be represented by a 
vector in RPNS (for instance, the entries of the Collins-Gisin representation). Let us denote by vp 
such a vector, using for simplicity v; for the vector representing the local deterministic behavior 
Pipi. Then A € R@®nst+))x4Lp and b e RPNSt! are given by 


A= V1 U2 <7 Utry : p= Up ; (E.10) 


the last entries describing the constraint }7;g; = 1. 
Let us now turn to the description (E.7) of the dual. Denoting by v the first Dys components 
of v, and by vo the last component, it’s easy to verify that the dual reads 


d= max — (vv + vo) 


subject to vp! + vo > OYk = 1, ...,ĤLD (E.11) 
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If P is local, vp is a linear combination of the vz; then the constraints force vi, v’ + vo > 0, whence 
d = 0. Thus, if the primal is feasible, the dual is also feasible and we have strong duality. Conversely, 
if P is nonlocal, one can find v such that vf, v’ + vo < 0. This is nothing else than the definition of 
a Bell inequality violated by P. Once such a v is found, any scaling av is also a solution; therefore 
d* = +00. Thus, when the primal is infeasible, the dual is unbounded. In order for the algorithm to 
output a Bell inequality with finite coefficients, one can add the additional constraint vay! +v => 
—1 to the dual, as suggested in Eq. (19) of (Brunner et al., 2014). 


E.2.2 An example of SDP 


As an example of SDP, we look at a relaxation of the quantum upper bound for the guessing 
probability (8.8) that reads 


P=maxipy y) Div) Pæ, o (a's b'O, 0) 
subject to Dav) Pæ, b) =P (E.12) 
Pæ, v) € QF forall a’ € A, b' € B 


The last condition is captured by the positivity of hermitian moment matrices T (a, 4). Now we are 
going to cast this problem in the standard form, following (Bancal et al., 2014). To avoid confusion, 
the inputs of the Bell test will be denoted by X, Y, leaving x, y for the semi-definite variables. 

Let’s set Piao) = pee a») Fi Where {F;} is a fixed basis of hermitian matrices. Let’s also define 
Kaoxy as the constant matrix that picks up the element (a, b|X, Y ) of a behavior in the moment 
matrix, that is Paw) (a, b|X, Y) = Tr(Kaoxyl a0’) = X; Cex *(a',b!) with CubXY = Tr(Kaæxy F. 
Now the primal reads 


P=max, Lao Picavo, b) 
bedt Fua D EEEN = P(a, b| X, Y) for all a, b, x, y. (E.13) 
Dixa pti = 0 for alla’, b 


This is of the form (E.8) with the opposite sign for x, the matrix Aj; = C gyti = —P(a, bX, Y ) 
with j = (abXY ). Thus we know that the dual will read (using y = —v) 


d= min, z > axy Pla, 1X; Y)yabxy 
subject to Cane + Tr(FiZa',b’)) — } axy E ayy abXY = 0 for alla’,b’, i. (E.14) 
Z(a',b') > 0 for all d, b. 


If strong duality holds, i.e., d = P, the solution y` of the dual defines a hyperplane in the space of 
behaviors, such that } „pyy P(a, b| X, YWWrxy = P. In other words, the result of the optimization 
can be read by evaluating the suitable Bell inequality on the observed behavior (though this suitable 
inequality usually can’t be guessed without solving the problem anyway). 


Appendix F 
Device-Independent Certification: 
History and Review 


F.1 Bell Nonlocality and Quantum Information Science 


F.1.1 Before 2005: Nonlocality as an inspiration from the past 


Quantum information science burst on the scene of physics in the last decade of the twentieth 
century. Quantum cryptography (more precisely, quantum key distribution, QKD) was invented 
by Bennett and Brassard (1984) and by Ekert (1991). It promised the possibility of distributing a 
secret between distant players communicating over a channel that can be tapped into, something 
impossible to achieve without the exchange of quantum systems. Shortly after that, Shor (1994) 
proved that a quantum computer would reduce the time needed to solve some specific problems, 
notably the factoring of large integers into prime factors. 

The work of Bell has constantly been cited as an inspiration for quantum information science, 
insofar as it had replaced philosophical debates with a testable argument. The quantum infor- 
mation attitude was similar. Take the example of incompatible measurements: Generations of 
physicists have been puzzled by this fact, and many had even hoped to find a way out of that. 
Bennett and Brassard, on the contrary, accept the fact and use it to fool an eavesdropper: If the 
infamous Eve taps on the channel to learn the secret information carried by the quantum system, 
she will unavoidably modify the state of the system and her presence will be noticed. Ekert used 
Bell nonlocality itself to reach the same conclusion: If the outputs were not determined prior to 
the measurements, as the violation of a Bell inequality certifies, then those correlated numbers are 
secret for anyone that has not seen them. Indeed, nobody could have a copy of numbers that did 
not exist. 

However, for several years Bell nonlocality was not a major topic in quantum information science. 
Sure enough, riding on the success and popularity of entanglement theory, the theory of nonlocality 
also thrived more than ever before: Many of the works cited in Part I were produced during those 
years, and nonlocality was brought into the community of theoretical computer science through 
the formalization of “non-local games” (Cleve et al, 2004). Also, some experimental groups took 
advantage of the favorable climate to realize improved Bell tests, whose intriguing character was 
duly noted (Weihs et al, 1998; Tittel et al, 1998; Stefanov et al, 2002). But from a quantum 
information perspective, Bell inequalities were seen as sub-optimal entanglement witnesses that trade 
power in detection of entanglement (they will never detect the entanglement of Werner states whose 
state behavior is local) with the capacity to rule out LV models (already very convincingly falsified 
by several experiments). By and large, the future seemed to belong to approaches better tailored 
to the advances of entanglement theory. 
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The change of perspective took place around the year 2005 and was completed by 2007, opening 
the field of device-independent (DJ) certification. At the moment of writing DI certification has 
grown into a full sub-field of quantum information (section E2). This development has not come 
without confusions, notably about the actual extent of the independence from the devices. These 
will be clarified in section E3. Before going to that, we revisit the history of device-independence. 


F.1.2 The tortuous path to device-independence 


Pre-history first—or lack thereof: The notion of device-independence remains hidden in all the 
classic literature on nonlocality. This is not to say that derivations of Bell inequalities were wrong: 
They are certainly correct in all the works cited in Part I and many others; a close scrutiny may 
even find traces of the notion, which is natural after all. But certainly no prominence was given to 
it, rather the opposite: Emphasis was often put on the physical degree of freedom for which some 
inequalities were thought to be tailored.! 

The idea that Bell inequalities can be used for certification appears first and very explicitly in 
Artur Ekert’s re-discovery of QKD (Ekert, 1991). In one of those strange turns of fate, the intuition 
was immediately killed: Bennett, Brassard, and Mermin (1992) set out to argue that the violation 
of Bell’s inequality is not the right way to look at QKD, one has to look for entanglement.” They 
went on to prove that, for qubits, Ekert’s protocol is equivalent to an entanglement-based version 
of the original BB84 protocol. The caveat “for qubits” was not noticed, by them or anyone else: 
The QKD literature uniformly repeated the claim that Ekert’s protocol and BBM92 are equivalent 
till 2006. 

Device-independence in QKD resurfaced in the pioneering work of Mayers and Yao (1998; 
2004) that we have presented in detail in section 7.2. Mayers and Yao only proved self-testing of the 
ideal correlations, which doesn’t count as a security proof—to their credit, unconditional security 
proofs had not been developed yet: Mayers himself was finalizing the first one. To add to the 
complication, Mayers’ papers were very hard to read because of their mathematical sophistication. 
In this situation, the Mayers-Yao result was taken note of but not followed up: The field of QKD 
rather grew using more understandable approaches to security proofs. 

The work that proved decisive to trigger the awareness of device-independence was to come 
again from the side of key distribution, but with a very foundational twist. Barrett, Hardy, and 
Kent (2005a) realized that one could prove security using only the no-signaling assumption based 
on nonlocality. They invented a suitable protocol and provided a proof of principle, in which 
one secure bit can be extracted from an infinitely long raw key. In the last paragraph, this paper 
comments on Ekert’s protocol—but, quite astonishingly, upholds the received knowledge that 
nonlocality does not really play a role there. 


1 As already mentioned, Bell inequalities with d outputs were routinely referred to as being “for qudits” 
even in the title of papers (Collins et al, 2002a; Kaszlikowski et al., 2002; Zukowski and Brukner, 2002). For 
another example among many, Belinsky and Klyshko (1993a; 19936) present their derivation of the MABK 
family as novel because they had optical interference experiments in mind, while previous works described it 
with spins. As for textbooks of quantum mechanics or even quantum information that mention Bell nonlocality, 
the presentation always considers a concrete implementation of Bell tests, either with spins of electrons (Bohm’s 
thought experiment) or with polarization of photons (Aspect’s real experiment). This element of concreteness 
is certainly welcome for beginners, but the prose all too often mixes the structure of the Bell test with technical 
considerations about the devices. 

2 In the previous subsection I mentioned the marginalization of nonlocality by the emergent entanglement 
theory: It was definitely based on personal recollections, but this paper is an objective witness of that atmosphere. 
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Nevertheless, the idea was launched. In a paper that elaborates on security under the sole 
no-signaling assumption, Acin, Gisin, and Masanes (2006) finally stressed that the equivalence 
between BB84 and Ekert holds only for qubits.? This crucial insight is presented in detail in the 
next subsection. 


F.1.3 The last step 


The behavior that describes the correlations expected in an ideal, error-free run of the Bennett- 
Brassard 1984 (BB84) protocol for QKD is 


1 
P(a, b|x, y) = z! + abôx,y) (F.1) 


with x, y € {0, 1} and a, b € {—1, +1}. In other words: When x = y we have only (a, b) = (+1, +1), 
or (—1, —1) with equal probability; when xy, a, and b are uncorrelated. 

Let’s first assume Alice’s and Bob’s systems are qubits. A two-output POVM on a qubit {E+, 
E_} can be parametrized as E; = 3(% + cn- 5) with ye > 0, y+ +y- = 2,0 < n < min(y+,y-), 
and |ĉ| = 1. Using such expressions for Alice’s and Bob’s measurements and the generic form of a 
two-qubit state (3.12), one obtains for a generic behavior 


P(a,b|x, y) = [in + anxy, Tr(pax -0)+ bnyysTr(pby -0)+ abnxny Tx | 


Ale 


with T, = Tr(pa,-o ® by -@). To recover (F1) it is necessary to set yX = Ye = 1 for all (a, b, x, y); 
also, since |7'yy| < 1, we need ny = ny = 1 for all (x, y): That is, the measurements are projective. 
Furthermore, the state and measurement directions must satisfy 


Tr(pdy +6) = Tr(pb, 5) =0, (F.2) 


A 


Tr(pa, +5 8 by: F) =ôx,y- (F.3) 


Without loss of generality, we can set ĉọ = bo = 2, aj = cosax+ sing and by = cos f£ + sin BZ. 
se eee 

The conditions (E3) read Tz; = 1, Tę = tana, Tp = —tanB and Ty, = ae i However, 

Tz = 1 forces Tz3 = Tz, = 0. Indeed, consider Tọ = Tr[po; ® (cosdox + sinĝoz)]. On the one 

hand, |Tọ| < 1 must hold because it’s the expectation value of an operator whose eigenvalues 


are +1, 0, and —1. One the other hand, Tọ = cos@ Tzs + sin@ Ts, whose achievable maximum over 
O is ,/ T2 + Te. So, if Tz; = 1, Tz must be equal to zero. The proof is identical for Tzs, changing 
the roles of Alice and Bob. Thus we are forced to set sina = sin 6 = 0, and the state is characterized 
by Tz; = 1 and Tz, = +1. As is well known, the positive solution identifies uniquely the state |t), 
the negative solution the state |7). These states satisfy automatically the condition (E2). Thus, 


3 This paper made a few people angry for a short while. On the one side, some of those who had invested 
a lot of effort in proving the “unconditional security” of BB84 took some time to digest the idea that this 
was “unconditional under the assumption of perfectly characterized devices” (lesson to be learned: Don’t take 
buzzwords too seriously). On the other side, the author of this book missed the opportunity to be a co-author 
of this breakthrough for some really silly personal issues (fortunately, the issues were recomposed and he did 
not miss the next one). 
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in order to produce the local behavior (F1) with two qubits, one needs maximally entangled states 
and maximally incompatible measurements. 

However, the behavior (E1) is obviously local: It can be obtained by sampling, with equal 
probability, the four local deterministic behaviors defined as (ao, a1; bo, b1) = (+1, +1; +1, +1), 
(+1, —1; +1, —1), (-1, +1; —1, +1), and (—1, —1; —1, —1). Thus, the security of the BB84 
protocol relies crucially on guaranteeing that signals are qubits—or more generally, that there is a 
“qubit fraction” in the signals [see the review (Scarani et al., 2009) for details and references]. 

This observation led to the awareness that most tools developed in quantum information theory, 
from simple geometric objects like entanglement witnesses to elaborate constructions like security 
proofs for QKD protocols, rely on some characterization of the devices, usually a very accurate one. 
Nonlocality can dispense from this characterization. Certification through nonlocality is not just 
interesting for foundational concerns like guaranteeing security against hypothetical no-signaling 
adversaries: It is relevant for quantum information tasks as feasible today. 

The next breakthrough followed from this insight: A security proof of QKD against a quantum 
adversary based on nonlocality (Acin, Brunner, Gisin, Massar, Pironio, and Scarani, 2007). In its 
title, this level of certification was called device-independent, and the wording stuck. This paper was 
the one that properly triggered the field. From a revered pioneering insight ready for storage in a 
museum, Bell nonlocality had become one of the most powerful tools for the certification of devices. 


F.2 Overview of Tasks 


Primarily, one can certify that the device 1s a nonlocal resource in a DI way. There is no exhaustive 
closed list of what other certifications may follow from that one. This section is a quick review of 
what has been considered so far. 


F.2.1 Certification of secrecy tasks: Key distribution 
and randomness 


From the history in the previous section, we know that DI started from studying the security 
of key distribution (Mayers and Yao, 1998; Barrett et al, 2005a; Acin et al, 2007). A posteriori, 
this was not the most logical beginning: Randomness certification would have been more natural, 
key distribution being an advanced form of randomness extraction involving two distant players. 
The idea of DI certification of randomness appears first in the Ph.D. thesis of Roger Colbeck 
under the supervision of Adrian Kent [that material was eventually published (Colbeck and Kent, 
2011)]. However, the proper theoretical tools were introduced alongside with a proof-of-principle 
experiment’ in (Pironio et al, 2010). A further conceptual breakthrough was the introduction of 
randomness amplification (Colbeck and Renner, 2012), that we have presented in section 11.5. 
The amount of theoretical works exploring the DI certification of randomness grew rapidly: 
For an overview, I refer the reader to the two review articles available on the topic (Pivoluska and 
Plesch, 2014; Acin and Masanes, 2016). Most works adopted the same security assumptions of 
QKD, with an adversary that can prepare the state and keep quantum side-information, and thus 


4 That experiment used entangled ions and could certify 42 random bits. A more recent experiment with 
entangled photons (Shen et al, 2018) certified 617,000 random bits in a 42-minute run. Both sets of authors 
state that the appearance of 42 was purely coincidental and is not a wink to Douglas Adams’ The Hitchhiker’s 
Guide to the Galaxy, where “42” is the “Answer to the Ultimate Question of Life, the Universe, and Everything”. 
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focused on randomness expansion (which was at times termed “randomness amplification,” before 
the latter took up the current meaning). 

Experiments have already demonstrated the generation of randomness from the violation 
of Bell inequalities against adversaries with quantum side-information (Liu et al, 2018; Shen 
et al, 2018) or with classical side-information on no-signaling resources (Bierhorst et al, 2018). 
An experimental demonstration of DIQKD is still lacking at the moment of writing, due to the 
additional requirements discussed in subsection 8.4.2. 


F.2.2 Certification of quantum resources 


Once one assumes that quantum theory holds, nonlocality certifies the presence of entangled states 
and the ability of performing incompatible measurements. As previously mentioned, one can obtain 
quantitative estimates, i.e., lower bounds for the amount of entanglement in the state, and for any 
measure of incompatibility of the measurements. We have not discussed these DI certifications 
in the main text, as the role of nonlocality is very similar to the case of randomness: Only, one 
minimizes a measure of entanglement rather than a measure of randomness (Moroder et al. 2013). 
As witnesses of the versatility of DI certification, we can cite the possibility of certifying genuine 
multipartite entanglement (Bancal et al, 2011b) and a scheme to guarantee that a measurement 
device has entangled eigenstates (Rabelo et al., 2011). 

We have devoted chapter 7 to the remarkable possibility of se/f-testing, which was discovered 
twice. First, the self-testing character of S= 2/2 was noticed in the context of nonlocality 
(Tsirel’son, 1987; Summers and Werner, 1987; Popescu and Rohrlich, 1992b). Second, by the 
works of Mayers and Yao mentioned—and which are so focused on cryptography that fail 
to mention anything about nonlocality. Strictly speaking, self-testing applies only to extremal 
behaviors. However, given any behavior, one can consider approximate self-testing with reference 
to one of these ideal cases. This makes self-testing suitable for certification. However, if a 
certification can be done directly, like that of randomness or entanglement, passing through an 
intermediate self-testing step can only worsen the bound.” Rather, self-testing may be helpful 
if the goal is to test the inner working of an all-purpose machine—a quantum computer. This 
line of thought was pioneered by the paper of Reichardt, Unger, and Vazirani (RUV) (2013). 
The main challenge of such a proof is how to go beyond the 1.i.d. description: One would like to 
prove that observing S ~ 2./2 on N rounds certifies the presence of approximately N singlets, 
without even assuming a tensor product structure between the degrees of freedom measured in 
each round! RUV managed this theoretical feat, albeit with a very bad scaling as N grows. Further 
work have significantly improved the construction. For an overview of these steps and arguably 
the best construction at the moment of writing, we direct the reader to (Coladangelo et al., 20170). 
For a more pedestrian example of the self-testing of two singlets, see (Wu et al, 2016). 


F.2.3 Dimensionality of the physical system 


The notion of randomness is independent of quantum theory, while the notion of entanglement 
is only defined in the theory. The notion of dimensionality of a physical system lies half-way. It is 


5 It is not obvious how to obtain a bound on, say, guessing probability from a bound on state fidelity after 
local isometries; but one example would suffice: Singlet fidelity F < : are compatible with separable states, 
so from that bound alone one would have to infer Pguess = 1. However, if the behavior is nonlocal, there is 


certifiable randomness in it. 
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naturally defined in quantum theory as the dimensionality of the Hilbert space that describes the 
system. But one can give it a theory-independent meaning, as the maximal capacity of the system 
to encode information. 

Now, one can ask: what is the minimal dimension of the system required to produce a given 
behavior? Because it depends only on the behavior, this lower bound on the dimension can be 
certified in a DI way. However, for this assessment nonlocality is not required (Wehner et al, 
2008). For instance, the local behavior (F1) can be realized with two qubits, which is obviously the 
minimal dimension, but requires entanglement. It can also be realized with two four-valued strings, 
which have higher dimension but are trivially achievable with classical means. By rewriting results 
on communication complexity, one can find examples of local behaviors with an exponential gap 
between the classical and the quantum dimensionality, but again the latter require more and more 
complex entangled states. It is therefore not clear which resource is smaller, the lower-dimensional 
entangled state or the higher-dimensional classical realization. 

When the behavior is nonlocal, no classical model can reproduce it, and a dimension witness 
based on nonlocality certifies the minimal dimension of the degrees of freedom that are entangled. 
The first such study proved that a sufficiently high violation of the CGLMP inequality for 
m = 3 cannot be achieved with qubits, and is therefore a witness of dimension d > 3 on both Alice 
and Bob (Brunner et al., 2008). A short time later, it was proved that witnesses for any dimension 
can be found in Bell scenarios with binary outputs, increasing the number of inputs (Vértesi and 
Pal, 2009). More recently, it was noticed that even nonlocality-based dimension witnesses are not 
exempt from ambiguities (Cong et al., 2017). 


F3 Device-Independent, Really? 


Like every successful label, “device-independent” may lead to emotions: Those who possess it 
may overstate its relevance, those who don’t possess it may either despise it or try to claim it for 
themselves. All of this has happened. Sociology of science is not the topic of this book, but a subtle 
confusion must be addressed. We have already hinted to it when discussing the difference between 
randomness and secret randomness (subsection 8.1.3). 


F.3.1 Providers and adversaries 


If we trust everyone’s competence and honesty, there is no need for certification. Certification 
should be conclusive when the provider is untrusted. Misunderstanding arose from confusing the 
untrusted provider with the adversary as considered in cryptographic tasks. 

The untrusted provider may manufacture poor quality devices: If he is aware of it, he is dishonest; 
if not, he is simply incompetent. The cryptographic adversary has a completely different goal: She 
is interested in the product of the operation of the device. To put it as a caricature: The untrusted 
provider may sell you an expensive handphone that does not connect to the Internet, the adversary 
will sell you a cheap handphone that connects perfectly well to the Internet and on which she 
can spy. 

Let us apply this understanding to DI certification of QKD. Alice and Bob take the devices and 
perform a loophole-free Bell test. Based on the observed behavior, they certify that the device works 
as it should: It has the capacity of producing secret keys. But producing the key is not enough: It’s 
meant to be a secret key, so the information should not leak out at any later time. If the device 
comes from the adversary, in all likelihood it contains an emitter that does exactly that: Leak out 
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the produced keys to her at the end of the process.® At a price of a redundancy, let me stress that 
the emitter is not used to produce nonlocality through communication: That was duly tested by 
closing the locality loophole. But leakage at a later time can’t be prevented based on space-like 
separation. 

In summary, DI certification is really device-independent and applies to any device, whether 
manufactured by an untrusted provider (allegedly incompetent, allegedly dishonest, or both) or 
by the adversary. But in order to produce actual secret keys, Alice and Bob must be sure that there 
is no leakage. This cannot be checked in a DI way; at the very least, the adversary must not have 
been involved in the manufacturing. 

In view of this discussion, some DI studies have assumed that the devices are manufactured 
by honest’ providers (e.g., our colleagues in academia). One may then accept to relax some of the 
requirements for a loophole-free Bell test. Notably, notice how to activate the locality loophole 
one must maliciously engineer communication channels that specifically send information about 
the inputs, whereas the detection loophole could be opened unwittingly (recall Appendix B.3). 
Thus, an honest provider may be dispensed from closing the locality loophole, while the detection 
loophole must be closed to claim DI. 


F.3.2 Of labels and men 


If a certification leaves some loopholes open because this is reasonable under some model of 
provider, does it still deserve the label “device-independent”? Is it accurate to speak of “device- 
independent security of QKD,” given that no leakage must be assumed to actually produce secret 
keys? I prefer not to take strong stances on these matters. Ultimately, labels have a limited value 
for scientists. 

What matters for this book is that we appreciate the unique power of nonlocality as a tool of 
certification. What matters for science in general is that the assumptions underlying the claims are 
clearly stated. In this last respect, the field of DI has made a positive impact on the whole of quantum 
information. In particular, there has been a growing interest in intermediate levels of certification. 
We devote section F4 to them. 


F4 Certification with Partially Characterized Devices 


Inspired by DI certification, several certifications have been proposed that rely on a partial 
characterization of the devices. The following list cannot be exhaustive: Anyone who has a good 
reason to make an assumption should feel free to define the corresponding certification. It is always 
assumed that quantum theory is valid. 


F.4.1 Characterizing specific devices 


This form of departure from DI certification is of particular appeal to experimentalists who 
Gustifiably) are pretty confident of their understanding of the working of their devices. 


6 A manufacturing adversary may use subtler, more complicated tricks to learn the key in the context of 
QKD (Barrett et al., 2013). 
7 Usually termed trusted providers in the literature, but we saw that the wording is ambiguous. 
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e The assumption of fair-sampling at detection is the most famous such example (recall 


subsection 1.5.2). The attitude of the community towards this assumption has undergone 
fluctuations. For several years, it had been considered absolutely natural. Experimental 
groups were working towards closing the detection loophole for the glory of having done 
it and for the exciting technical challenges that it entailed, but nobody was taking the threat 
seriously. The mood swung almost completely at the height of excitement for DI certification, 
when theorists started pushing for taking the detection loophole as a serious threat—and they 
were listened to, although an actual experiment played also an important role (Gerhardt 
et al, 2011). 

That being said, it would be too sectarian to put an anathema on every experiment that 
does not close the detection loophole: Strong as it may be, fair-sampling remains a legitimate 
assumption to be stated, especially if one is trying to demonstrate something other than strict 
Bell nonlocality or DI certification.Besides, the fair-sampling assumption can be relaxed to a 
Santha-Vazirani assumption by adapting the certification of nonlocality under measurement 
dependence (chapter 11). With these techniques, it is enough to assume that, in each round, 
each detector must have non-zero probability of firing (Putz et al., 2016). 


One could assume that a device other than the detector behaves “as it should.” For instance, 
by assuming that a beam-splitter does not introduce any temporal delay, one can demonstrate 
nonlocality in experiments with time-bin entanglement that otherwise have a local variable 
description (Martin et al, 2013). 


Bell nonlocality with optical fields is a sub-field that we have not discussed in this book. 
While there exist proposals of proper Bell tests (Banaszek and Wodkiewicz, 1998; García- 
Patrón et al, 2004; Zukowski et al, 2016), a lot of works in the literature relied on optical 
assumptions. This is due to the fact that it’s easy to find Bell inequalities for measurements of 
the photon number N, but no easily-implemented measurement gives access to this degree 
of freedom, while the X and P quadrature operators are associated to the commonplace 
homodyne or heterodyne measurements. Now, in quantum optics, it holds N = (X a4 
P? — 1) for a monomode field. Thus, one can perform quadrature measurements, derive 
a value of N in each round, and compute its statistics. However, if one were to replace N 
with X and P into the Bell inequality, the resulting expression can be violated with LVs. 


The same kind of assumption is made in the verification of a family of Bell inequalities for 
many-body systems (Tura et al. 2014; Schmied et al, 2016). Once again, the theoretical 
objects are proper Bell inequalities that contain only two-body correlators. Two-body 
correlators can be measured in atomic gases, but only with all particles subject to the same 
input; while the inequality needs also correlators when the two inputs are different. Adding 
the notion that the measurement of a spin is a measurement of a direction, the desired 
correlators can be extracted by comparing feasible measurements along different directions. 
Indeed, ñ- m= 5 [|ñ + ml? — ñ]? — |m|?]. 

Finally, some certifications have been made under the assumption that one knows the 
dimension of the Hilbert space of the system under study, but nothing about the measurements. 
In abstract terms, the assumption is valid; but it is not clear on what evidence it can be based. 
Recently, a much more promising approach has been proposed to deal with more physical 
assumptions like upper bounds on the energy or on the number of photons (Van Himbeeck 
et al, 2017). 
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F.4.2 Steering: Uncharacterized Alice, characterized Bob 


Many presentations of entanglement stress the idea that, by making a measurement, Alice can steer 
Bob’s state into one or another basis. For instance, if the shared state is |o*) 4p bY Measuring 
her qubit in the Z4 basis, Alice will prepare either |+2) p or |—2) g on Bob’s side. But she could 
have measured in the X4 basis, and then prepare either |+) p or |-â) g: 505 Alice could tell Bob 
in each round which state he is supposed to have. This looks impressive because we have fully 
characterized the joint state of both Alice and Bob and we know it to be |o*) ) 4p: It would look 
far less impressive if that state would have been an incoherent mixture of |0) 4 |+2) B |l)a4 |—2) B 
12)4 |+â)p and |3) 4 |—%) 5: In this case, Alice’s measurement in the computational basis reveals 
which state already exists on Bob’s side, so there is no steering. Could Bob be convinced that Alice 
can steer his state without knowing anything about Alice’s degree of freedom? 

This question led to the formal characterization of EPR-steering (Wiseman et al., 2007): Alice 
won't be able to convince Bob that she is steering his state if there exist normalized states p} of 
Bob’s system such that 


pa =f drain, F4) 
and the observed behavior can be written 
Pia, bjx, s) = | ddg(aP, als) TX.) Œ.5) 


It is evident that Bell nonlocality implies steering, as (F5) is a local behavior. Also, if the joint 
state p4p is separable, obviously (F5) holds together with (K4). That being said, EPR-steering 
is provably different from both nonlocality and tomography: There exist local behavior that 
demonstrate steering, and there exist entangled states whose state-behavior cannot demonstrate 
steering. For many more results and all references, we refer to reader to the review (Cavalcanti and 
Skrzypezyk, 2017). 


F.4.3 Characterized quantum inputs 


Francesco Buscemi (2012) introduced the idea of Bell-type tests in which the inputs are quantum 
states: To any x € Æ there corresponds a state éx, and to any y € Y there corresponds a state vy. 
The players know the possible states but may not know which state is sent in each round. He proved 
several results in this scenario.® We are interested in a corollary: Given any shared entangled state 
PAB; in this scenario the players are able to verify that the state is entangled indeed. He proved 
that every entangled state can be detected in such a scenario. The connection with certification 
was first emphasized in (Cavalcanti et al, 2013). One can turn these techniques into a fully DI 
detection of entanglement by self-testing the input states (Bowles et al, 2018). 


8 Notably, he could define a necessary and sufficient condition for the possibility of transforming one state 
into another with local operations and shared randomness (LOSR). 
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Following (Branciard et al, 2013), let us first describe the procedure with fully characterized 
devices. As mentioned in Appendix C.2, an entanglement witness is an operator W such that 
Tr(o W) = 0 for all separable states p and Tr(o W) < 0 for some entangled states. An entanglement 
witness for two-qudit states can always be decomposed, in a non-unique way, as 


W=} wt! 8u (F.6) 
Xy 


where the Ex and vy are qudit states, T' indicates transpose and wxy are some real coefficients. 
Suppose Alice and Bob share the two-qudit state p 4g. If Alice is given an auxiliary system in the 
state £x, she can perform on it and her system the joint projective measurement that yields a = 1 
for the state a $X; li) ® |i) and zero otherwise; Bob follows the analog measurement scheme. It is 


then easy to prove that P(1,1|x,y) = Trloapéz & ws so that an WyyP(1,1|x,y) = Tr(eagB W). 
By definition of an entanglement witness, 


X wgPA, 1x59) < 0 (2.7) 
Xy 


is only possible if p4B is entangled. Conversely, since entanglement witnesses exist for any 
entangled p4B, every entangled state can be detected by a suitable choice of the input states £x 
and vy. 

The narrative of certification with partial characterization follows straightforwardly: The players 
claim to have the state p4pg that is entangled, and the verifier wants to check that. In each round, 
the verifier sends one of the states éx to Alice and one of the states vy to Bob. The players return 
their binary outputs a, b € {0, 1}, and the verifier can compute (E7). 


F.4.4 Networks with uncorrelated sources (a.k.a. N-locality) 


The setting for this last subsection is that of a network configuration. To fix the idea, consider 
first entanglement swapping. In a fully characterized description, Alice prepares the entangled pair 
and sends one of the subsystems to Charlie; so does Bob; then Charlie performs an entangling 
measurement, conditioned on whose outcome the systems of Alice and Bob end up entangled. 
In a fully device-independent description, Alice, Bob, and Charlie are the players and may have 
prepared any joint strategy: What happens in the intermediate stages stays in the black box, and 
the whole procedure is interpreted as a source creating entanglement between Alice and Bob. 

But from a physics perspective, the physicists operate devices that are source of entanglement: 
It is very natural to assume that Alice’s and Bob’s sources are uncorrelated (Figure E1, top). Under 
this assumption, called bz/ocality, local three-partite behaviors are described by 


BARTEN TE I die, / Haaat PEA APO ADP Ceaa in  ~ 8) 


which is more restrictive than the general f dàq(à)P(a|x, A) P (bly, 4) P(clz, à). It should be clear how 
to extend the idea to other networks, and one would speak of N-locality or network-locality. 

The characterization of N-locality is challenging, notably because the set of N-local behaviors is 
not convex. There is no comprehensive review at the moment of writing, but the reader can gather 
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Figure F.1 Top: Bilocality scenario modeling the physical understanding of entanglement swapping: 
Nonlocality is probed between Alice and Bob; the resources 4.4 and àp are independent, so all correlation 
must come from conditioning on the output c. Bottom: A three-locality scenario with no inputs that 
describes a normal Bell test on A and B, with measurement independence explicitly enforced. 


a good amount of information from (Branciard et al, 2012b,a; Fraser and Wolfe, 2018; Renou 
et al, 2019). 

Let me finish with an intriguing remark. Consider the three-locality scenario of Figure E1, 
bottom. It has no inputs. The shared resources ® y4 and ®zy are such that both players obtain 
the same output in each round (you can think of a maximally entangled state measured in the 
computational basis, but it could be the corresponding local resource too). Thus, Alice can use 
her half of |®) y4 to generate the input x for a Bell test on p 4p, and Bob can behave similarly. In 
this sense, Figure E1 bottom could be read as a version of Figure 1.1, describing a generic Bell 
test: Indeed, the maximally entangled states are arguably the most perfect realization of a quantum 
random number generator that reveals its output, and the assumption of three-locality is nothing 
else than measurement independence for the Bell test to be performed on pz. 


Appendix G 
Repository of Technicalities 


This appendix contains some instructive technical results that would have slowed down the pace 
of the main text. 


G.1 Analytical Solution of the Facets of the CHSH 
Correlation Polytope 


It is instructive to work out explicitly the facets of £ and derive the corresponding Bell inequalities. 
The simplest scenario has M4 = Mp = m4 = mg = 2. In this case, Dys = 8 and there are 16 
extremal points: Finding the facets is a very easy task for a computer, but still cumbersome to write 
down here. We are rather going to study a very meaningful sub-polytope of £. 

For a choice of settings (x, y) and binary outputs, the correlation coefficient is defined by 


Exy = P(a = b|x,y) — P (a F bjx, y). (G.1) 
Any quadruple of numbers 
u = (Eoo, Eo1, E10, E11) (G.2) 


with —1 < Exy < 1 is a priori a valid correlation vector. The sixteen vectors such that ||ul|? = 4, 
i.e., those vectors whose components are either +1 or —1 are extremal points of a polytope 
embedded in R. 

To see which constraints are added by requiring that P € £, it is convenient to use the labeling 
convention a, b € {—1, +1}. With this choice, Exy = (axby). In particular, for LD processes it holds 


Eyy 2 axby, which directly leads to EoọoE01 E10E11 os 1. Therefore, the extremal points of the local 
correlation polytope are the eight vectors 


vi = (+1, +1, +1, +1), v5 =-—v = (—1, —1, —1, —1) 
v2 = (+1, +1, —1,—1), v6 =—v2=(-1,—-1,4+1, +1) 
v3 = (+1, —1, +1, —1), v7 =—-v3 = (—1,+1,—1, +1) 
v4 = (+1, —1,—1, +1), vg = -v4 = (—1, +1, +1, —1). 


(G.3) 


Notice that {v1, v2, v3, v4} are mutually orthogonal, so in particular they are linearly independent: 
This implies that R4 is the smallest embedding for the local correlation polytope. Now we want to 
characterize the facets of this polytope. Four linearly independent vectors are required to define a 
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3-dimensional hyperplane:! Our task consists of listing all sets of four linearly independent 
extremal points, constructing the hyperplane that they generate, and checking if it is indeed a 
facet. 

The symmetry of the problem makes the task simple. There are sixteen sets of four linearly 
independent vectors, namely the V, = {s1v1,52v2,53V3,54v4} with s= [s1,52,53,54] € {-1,41}*. 
The normal to the hyperplane generated by V; is the solution to the four linearly independent 
equation Ms - (SkVk) =f; = 4 for k = 1,..., 4 (the constant being chosen for simplicity). It is readily 
found to be n; = ey SpUp, either by direct inspection or by noticing that v; - vj = 46; for 1 < 4, 
J < 4. Moreover, the extremal points that do not sit on the hyperplane defined by V, are the four 
—SpUp; Which all lie on the same side of the hyperplane since obviously ns - (—sgvz) = —4. Therefore 
each of the sixteen sets V; defines a facet by the condition 


4 
ns- u < 4 with ns = X seve. (G.4) 
k=1 
To finish our study, we just have to inspect each of these facets (since n_s = —ns, we can just look 
at eight of them). We find: 
N41, +1,+1,+1] = (4; 0, 0, 0) — 4£o <4, 
M41,41,41,-1] = (2, 2;2,;-2) —> 2Eoo + 2Eo1 + 2E10 — 2E11 < 4, 
n41, +1,-1,+1] = (2, 2, —2,2) —> 2Eoo+2Eo1 — 2E10 + 2E11 < 4, 
N41,—-1,+1,4+1] = (2; —2, 2,2) —> 2Eoo-— 2Eo1ı + 2E10 + 2E11 < 4, (G5) 
N[+1,+1,-1,-1] = (0, 4, 0, 0) — 4£ <4, i 
N[+1,—-1,+1,-1] = (0, 0, 4, 0) — 4£j0 <4, 
M[+1,—-1,-1,+1] = (0,0, 0, 4) — 46 <4, 
M41,-1,41,41] = (—2; 2,2,2) —> —2Eoo + 2Eo1 + 2E10 + 2}, < 4. 


In summary, the local correlation polytope for the simplest scenario has sixteen facets. Up to 
relabeling of the inputs and/or of the outputs, eight of these describe the constraint Eoo < 1 and 
are therefore trivial, while the other eight describe the constraint 


S = Eoo + Eo1 + E10 — E11 < 2. (G.6) 


This constraint is not trivial, and indeed can be violated by valid correlation vectors which do not 
belong to the local polytope: In particular, the vector w = (+1, +1,+1, —1) reaches up to S = 4. 
In fact, we know that this is the CHSH inequality (2.29). 


G.2 Hardy's Test from the Schmidt 
Decomposition of the State 


This appendix repeats the calculation of Hardy’s test (subsection 4.4.1) starting from the two-qubit 
pure state written in the Schmidt decomposition (3.15). 


1 This is because the origin (0, 0, 0, 0) is inside the polytope. If the origin were on a facet, only three linearly 
independent vectors would be sufficient to specify such a facet. This is a technical point that does not need to 
bother us here, but may play a role for most compact parametrizations of probability polytopes. 
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Without loss of generality, the direction of Alice’s first measurement can be written 
ao = coso + sinaox. (G.7) 


This choice, together with the constraints (1.11), will fix all the other measurements in a sequential 
way. It is convenient to work with the projector on the state (3.15) 


1 
Ilo = 4 [I8 I+ cos 20(oz ®@1+1@ og) +03 @ og + sin 20 (az @ og — oF Q 05)]. 


The output + 1 of Alice’s measurement is associated with the projector Mj = 3a + âo - o); when 
this output is found, the state prepared on Bob’s side must correspond to Ig, in order to satisfy 
the constraint P (ao = bo = +1) = 0. Noting that 4, oz A, = COS o Tj Th og uA = singo Thy 


LET i ae + + _ l+cos20cosæo m+ = ; 
and 14,7114, = 0, one finds Ma Nell, =o 14, 8 Np, with 


_ 1 cos 26 + cosag sin 20 sin œo 
Iig — oz + Og). 
Oe 1 + cos 26 cosao 1+ cos 20 cos œo 
That is, we have found 
— _ _cos26+cos ao 
b = N : A ith cos Bo = 1+cos 20 cosœo G.8 
o = cos fo + sin Box, wit i E snae i (G.8) 
sin Bo = — TFcos 29 cosag 


The calculation for 4, follows by replacing wg with 7 + ag, and because of the second constraint 


: — — _ l—cos20 coso m— — 
we must read it as 14, TIo 14, = ge lege Q Mg,- Therefore 


cos 20—cosag 

1—cos 26 cosag 

sing —  sin2ĝsingo d (G.9) 
r= 1—cos 26 cosag 


N og tno Ae cos By = — 
by = cos 612 + sin fy xX, with 


Finally, the third constraint leads to Tg, Ilo Ig, = 1208249 cos fo 14, Q Mg, with? 


cosa = 2cos20+cosæo(1+cos? 20) 

À A p A r = 2 

a, =cosa,2+sina,x, with } | Sere Soson, (G.10) 
SHUG, = 1+cos? 20+2cos 26 cosa 


Using (G.9) and (G.10), after some tedious algebra one gets 


1 
P(+, +[1, 1) = A (1 + cos26(cosa@; + cos 1) + cosa, cos By + sin 20 sina, sin 61) 


i 1 sin? ao cos” 20 sin? 20 
~ 2 (1 + cos? 26 + 2cos 20 cosag)(1 — cos 28 cos æo) ` 


(G.11) 


2 The coefficients in (G.10) are obtained by taking (G.9), replacing Bj — a, and ag — Bo, then plugging 
in (G.8). 
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The maximization over the choice of measurement direction yields 


. 2 : i 

20(1 — sin 20 1 — sin 20 
sin? 20(1—sin28) e osag- 12 gay 
(2 — sin 20)? cos 26 


max P(+, +/1, 1) = 
ao 

We find P(+, +|1, 1) = 0 only for 6 = 0 and 0 = 4. The first condition identifies product states, 
which are expected not to show any nonlocality. The second identifies maximally entangled states, 
for which a = —a, = —bo = ĝi: Both on Alice and on Bob, the measurements are identical, so 
nonlocality can’t be demonstrated. For every other entangled pure state of two qubits, 0 < 6 < 4, 
P(4, +|1, 1) is strictly positive for every choice of Ag other than og the local Schmidt basis. Finally, 
the maximal quantum violation is indeed (4.17) obtained for the state with sin20 = 3 — v5. 


G.3 Pitfalls in Handling Signaling Behaviors 


This appendix is devoted to warning against possible pitfalls when dealing with signaling behaviors 
in a too abstract way. The safest way to avoid such pitfalls is to describe the signaling resource, in 
particular the directionality of the signaling. 

We work in the (2, 232, 2) Bell scenario and focus on the signaling behavior defined by 


P(O, 0|x, 0) = P(A, Olx, 1) = 1 for x € {0, 1}, (G.13) 


all the other probabilities being zero by normalization. It is easy to express in words what this 
behavior describes: Alice’s input x is ignored, Alice’s output a is equal to Bob’s input y (this is the 
signaling from Bob to Alice), and Bob’s output b is deterministically 0. 

Clearly, no resource that produces this behavior can output a before y has been provided by Bob. 
Such timing issue is not a concern for no-signaling behaviors, with which researchers of nonlocality 
are familar. In particular, as mentioned in section 5.3, the Svetlichny polytope does not take it into 
consideration. This is why the warning must be raised (Gallego et al., 2012). 

To see the dangers, let us work ex absurdo and suppose that, somehow, there exists a resource 
that produces the behavior (G.13) irrespective of the time ordering. Suppose then that Alice inputs 
x and immediately sees an output a: This means that Bob will be obliged to input y = a, which 
is absurd if Bob is at a distance. This intuitive absurdity can be even translated into mathematical 
absurdity by supposing that Alice, through another channel, sends a to Bob. If Bob inputs a, the 
probability of him observing 0 would be computed as 


P(b=0|x) = $ Pax) Po = Oly = aza, x) = D> P(a,b = O|x,y = a) = 2. 


Similarly, if Bob inputs y = 1 — a, one would compute 


P(b=0|x) = X PaPe = Oly = 1 — a;a,x) = X Pa, b= ox y=1-a)=0 
a a 


which is also absurd given that, by inspection of the behavior, b is never equal to 1 either. The 
formal mistake in both equations consists in writing P(a, b|x, y) = P(a|x)P(b|y3a, x) instead of 
P(a,b|x, y) = P(alx,y)P(Oly; a,x), matching the absurd assumption we started from. 
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As noticed by (Gallego et al, 2012), the careless usage of this model in the Svetlichny context 
may lead to predict nonlocality where there can’t possibly be any. Indeed, it is easy to predict a 
high value of CHSH if one’s probabilities sum up to 2... 


G.4 Jordan's Lemma 


Jordan’s lemma states the following. Let Ap and A; be Hermitian operators on an arbitrary Hilbert 
space H, with eigenvalues —1 and +1. There exists a basis in which both operators are block- 
diagonal, in blocks of dimension 2 x 2 at most. 


G.4.1 Proof of Jordan’s lemma 


By definition, Ab =4? = J4}. Besides, U = AoA; is unitary. Indeed, let us denote |a,0) an 
eigenstate of U: U |æ, 0) = wa |œ, 0) with |wa]| = 1. Then |æ, 1) = Ao|a,0) is also an eigenstate of 
U, since U |æ, 1) = 404140 |æ, 0) = AgUt læ, 0) = œ% |a, 1). Therefore: 


Ag læ, 0) = |æ, 1) and Ao |æ, 1) = |æ, 0), (G.14) 


Aj læ, 1) = UT |æ, 0) = w% |æ, 0) and Aj |æ, 0) = A1 (0x41 |æ, 1)) = wa læ, 1). (G.15) 
At this point, there are two possibilities: 


e |a,0) = |} is an eigenvector of 4o for the eigenvalue àg o € {—1, +1}. It follows from what 
precedes that |a,1) = A¢,9|&), and that |£) is also an eigenvector of A; for the eigenvalue 
Àg 1 = @ehe,0 (whence we = —1 or + 1). 

e |a,0) is not an eigenvector of Ag. In this case, (a, 0)|(a, 0) = 0, since they are different 
eigenvectors of the normal operator U. Now the conditions (G.14) read Aol, = og and 
the conditions (G.15) read Aj|q = Re(wa)ag — Im(o)og where the o% are the usual Pauli 
matrices defined in Span{|a, 0) ,|a, 1)} with the convention os = |a,0) (a, 0| — |æ, 1) (a, 1|. 


Since the eigenvectors of a unitary operator span the whole Hilbert space, we have proved that 


Ay = Dok Drz,018) (El (G.16) 
a 3 
A= P [Rewo ~Imtowdos |as: IE) (l. (G.17) 


This concludes the proof. 


G.4.2 Usefulness of the lemma 


Jordan’s lemma has been used often in the early days of device-independent certification, as a 
tool to reduce calculations from unknown Hilbert spaces to qubits. Unfortunately, it cannot be 
extended to more operators and/or more eigenvalues, hence its rather limited value. 
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In the main text, we invoked Jordan’s lemma in the proof of self-testing of S = 2/2. Specifically, 
in subsection 7.1.1 we claimed that: If there is a subspace of {Ap, A;} associated to the eigenvalue 
0, it must be of even dimension 2d//4, and that there exist a choice of basis in which Ap = 03 @ Iar 4 
and A; = og @ Iar 4. This is indeed a corollary of Jordan’s lemma: In the subspaces indexed by £, 
{40,41} |8) = 2wg € {—2,+2}, so to find the eigenvalue 0 we need to stay in the subspaces indexed 
by a, which are even-dimensional. Furthermore, 


{ox Re(wa)og' — Im(oa)o;"} = Re(w,y)I 


because {07,0%} = 0. Therefore to have the eigenvalue 0 one has to restrict to sectors such that 
Aila =o%. The claim made in the text follows by choosing a basis in which of = gz and os = 0 
for all w under consideration. 


G.5 A Case Study of Randomness with Characterised 
Devices 


G.5.1 Introduction and basic result 


We consider the example of extraction of randomness from characterized devices discussed in 
8.2.4: Claire sees p = 51+ nos) and performs the projective measurement of og. Eve’s most 
favorable decomposition of p is easy to guess (Figure G.1), and optimality can indeed be proved 
easily assuming that the device is really producing two pure states with equal probability, such that 
p= p+ + p- with p4 = i l+x)(+xl. The proof of optimality starting without any assumption is 
cumbersome and we leave it aside. 

In summary, we have to assume that Eve sees one of the two states 


i ao fi p 
Ix+)= zd +y1-n|+)+ zQ FVy1-|-â). (G.18) 


Upon seeing | Xy)s she guesses c = y and the guess is correct with probability 


P, = =(1+/1-7?). (G.19) 


Figure G.1 Bloch sphere representation of the randomness scenario of appendix G.5, with the 
decomposition of Claire’s mixed state that optimizes Eve’s probability of guessing the output bit. 
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Notice in particular that 


(+ålx,) = (—2lx-) = Pp 44x.) = (-8lx,) = V1 — Pe (G.20) 


which will be useful later. 

Now we are going to use this toy model to exhibit a difference between classical and quantum 
side-information. To do so, we consider two rounds of the process, at the end of which the parity of the 
two bits 1s revealed. We are going to compute what is the probability that Eve guesses correctly the 
first bit (the other being then trivially inferred). 


G.5.2 Calculation with classical side-information 


Due to the symmetry of the problem, we can assume that Eve sees the states | x+) |x+) in the two 


rounds. Claire’s bit is then distributed according to the following Pacz: Py4 = =P Ppap 


P(1 — Pg), P-- = (1—P,)?. Without further information, she would guess that the first bit is + 
1 with probability Pg. 
Now Claire reveals the parity: 


e Ina fraction Peyen = P? +(1— P,)? of the rounds, Claire will announce c1 cz = 1. Then Eve’s 
‘ : 5 BA eN P2 
guess for the first bit remains + 1, correct with conditional probability 54 


e Ina fraction Poda = 2P,(1 — Po) of the rounds, Claire will announce cıc2 = —1. In these 
rounds, Eve’s guess is random: Let’s say that she sticks to +1, but she’ll be correct with 
conditional probability 2. 


On average over all the rounds, Eve’s probability of guessing the first bit is 


2 
1 
+ Poaa X = = Po. (G.21) 


even 2 


/ 
P, = Peven X 


In other words, the guessing probability is unchanged! The fact that Claire revealed the parity 
helps only insofar as, if Eve correctly guesses the first bit, she will correctly guess the second as 
well. 


G.5.3 Calculation with quantum side-information 


In this case, Eve does not see a particular pair of states in each round: Rather, Claire’s perceived 
mixture comes about because Eve is entangled with her system and keeps the purification. So, for 
our two-round experiment, the Claire-Eve state is 


Wan = |e (bes) en +12-) et}) e Fate lea) + Ix-) eg DE (G.22) 


Now Claire reveals the parity: 
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e With probability 1 she announces cıc2 = 1. Then Eve knows that she has one of the two 
states 


j= (Palei) +/I-P, jet) 2 (VPelea) +1 =P, ed 
ve- = (VI Pele) + VPy let) @ (VI = Peles) + / Peed 
with equal a priori probability; we have used (G.20). She would like to find out which one 


she actually has. The classic result by Hesltr6m says that the highest probability of correct 
guessing in such a task is given by a + Vl — |(We,44|Wz,-_) |*). Now: 


~ 
races 


~ 
r a 


2 
Ivete) = (2V0 P) = 


whence P; eyen = (1 + V1- 774). 


e With probability 1 she announces c;cz = —1. Then Eve knows that she has one of the two 
states 


vra- = (VPyle) + VIF, et 
YE, -+ = (V1 = Peles) + JP; let 


——— 


) © (/T=Pelea) + VPs ed 
) 8 (/Pelea) +./1 Peed 


~ 
Aa 


~ 
~ 
Wie at 


with equal a priori probability. Once again, |(Wz,+—)|(we+-)| = n? and therefore again 


Poo = 20+ ¥1— 14). 


In summary, Eve’s probability of guessing the first bit correctly has now increased to 
pa r 
=z + 1—7n?*). (G.23) 


The underlying physical reason is of course that the measurement that achieves the optimal 
guessing probability is an entangled measurement on both Eve’s systems FE; and E2. If she were 
restricted to perform local measurements, her strategy would reduce to one with classical side- 
information. 


G.6 Simulations of the Singlet Behavior 


In the main text, we have described how the singlet behavior (9.4) 


A 1 R 
Dys [Pea blâ, 6) = 5 (1 — aba Ê) 


abe(-,+pabes?|. (G.24) 


3 Indeed, by no-signaling, it does not matter when Eve performs her measurement: When at the start of the 
previous subsection we assumed that the initial state was |x+)|x+), this could have been steered by Eve from 
the state (G.22) by measuring her systems in the computational basis. 
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can be simulated if in each round the players, alongside with classical information, share a PR-box 
(section 9.2) or can send one bit of information (section 11.2). In this appendix, we present the 
constructive protocols in the version of (Degorre et al., 2005), which unifies both approaches and 
even connects them with Werner’s local variable model for mixed states (subsection 3.3.1). The 
idea is to use the nonlocal resource in such a way that the local variable shared between Alice and 
Bob, a vector i on the unit sphere, ends up distributed according to 


Me: UE ee 
Q(A) = —|a-Al. (G.25) 

20 
Then, if Alice outputs a = sign(a- 1) and Bob outputs b = —sign(ô- i), one has (a) = (b) = 0 and 


AES $ n Ga 
(ab) =- f di— |@-Alsign(@- à)sign(b- à) =—a-b 
g 2r + 


=å- À 


as readily proved by passing in spherical coordinates chosen such that â= 2 and 6 = cos BÊ+ 
sin Bx. Clearly, the distribution (G.25) could not have been arranged prior to the round of the test, 
as it depends on Alice’s actual input â. 


G.6.1  Alice’s sampling 


As a first step, let us see how Alice can generate locally the distribution (G.25). Consider first the 
“rejection method”: 


1. Picki uniformly on S? and u uniformly in [0, 1]; 
2. Keep X if lâ- Äl > u, discard it otherwise. 


The probability that a given dis kept is the probability 1 that one draws a value u smaller than |â- Al; 3 
since u is drawn uniformly, this probability i is just lâ- àl. Therefore O(a) a |a- AI. By normalizing 
a posteriori, one finds the factor zz. Thus, a À chosen with the rejection method is distributed 
according to (G.25). 

The problem with the rejection method, as the name indicates, is that several instances of i are 
discarded (half of them on average). This is solved by noticing that, if i is drawn | uniformly in SÊ, 
u=|a- Ai is uniform in [0, 1]. Assume then that Alice starts with two unit vectors Xo and ra s drawn 
independently, each with uniform distribution on the unit sphere s?. She can then sample her À 
according to this rule (“choice method”): If |a- Kol = > |a- Xil; set A = Žo; otherwise, set À = Ay. Now, 
whenever À = io is selected, it is using the rejection method with uo = |a- ži |; whenever i= i is 
selected, it is using the rejection method with u1 = |a- žol. Therefore, in each round the i chosen 
by Alice is distributed according to (G.25). 


G.6.2 Application to simulation 


Let us now suppose that, prior to each round, Alice and Bob share two unit vectors Xo and À is 
drawn independently and uniformly on the sphere. The simulation of the singlet behavior with one 
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bit of communication is trivial: Alice picks uses the choice method to pick one of the two vectors 
and tells Bob which one she has chosen; Bob uses the same one to produce his output.* 

The simulation with one use of the PR-box is only slightly more involved. First notice that which 
A; Bob uses does not matter if sign(b . žo) = sign(ô- 4): So, if by default Bob outputs b = —sign(b- 
žo)» he has to flip his output only when bothhy is chosen by Alice and sign(b- ho) = —sign(6- did: 
Consider then the following protocol: 


e Alice runs the choice method and chooses À = ij. She inputs x = 7 in the PR-box and receives 
the output a € {0, 1}. Finally, Alice outputs a = sign((—1)*a- ay: 

e Bob checks if sign(ô- žo) = sigen(b . dy): If it is, he inputs y = 0 in the PR-box; otherwise, he 
inputs y = 1 and receives the output £ € {0, 1}. Finally, Bob outputs b = —sign((—1)86- Ao). 


The conditions for xy = 1 are exactly the conditions under which Bob’s bit should be flipped in 
the previous narrative. Here, when xy = 1 either player flips the output but not both; whereas in 
the rounds where xy = 0, either both players keep their output or both flip it. This is a necessity 
of the PR-box being no-signaling: Bob should not get to know if Xi was chosen. 

Interestingly, the case where Alice samples according to the choice method but no additional 
nonlocal resources are available is also instructive, because it connects with other considerations. One 
can imagine that Alice outputs a = sign (â - d) with the X chosen with the choice method, while Bob 
outputs b = —sign (ó . ho). This will produce the correct correlations in half of the rounds, while in 
the other half the oucomes will be uncorrelated (because Alice has used ot that is independent of 
Xo: The resulting behavior, which must be local, is nothing else than that of the Werner state with 
wW = 2. Alternatively, one can imagine that, knowing that Bob is going to use Xo , Alice refuses to 
produce an output in the rounds when she should have used À 1. The post-selected behavior is the 
singlet behavior. This is an example of simulation exploiting the detection loophole, in which Alice 
has 50% efficiency (we saw in subsection 1.5.2 that with those efficiencies one could actually reach 
S= 4). 


G.7 Properties of the Variational Distance 


The proofs of subsections 9.4.2 and 11.5.3 use the notion of variational distance between two 
probability distributions on the same alphabet 


1 
D[Pe, Qc] = 5) Pe © - Qc(o)l. (G.26) 


ceC 


Here we prove three properties that are used in those derivations: 


4 Notice that, in this protocol, Alice’s communication cannot be compressed over several runs, because the 
bit is unbiased. This shows that this protocol is not identical to the one of Toner and Bacon (2003), where some 
bias in the communication could be exploited to compress the average information down to approximately 0.85 
bits per round. 
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e The triangle inequality follows immediately from |a + b| < |a| + |b|: Indeed, 
1 1 
DIP, R] = 5 2 IP@ —RO|=5 Xu (P(e) — Q(0)) + (QC) — Ro )| 


1 
<5 2 (PO — QO + |) -RON = PIP, Q] + PIOR]. 


Thus, D is indeed a distance, the three other requirements being obviously satisfied: 
D[P, Q] = 0, D[P, Q] = 0 if and only if P = Q and D[P, Q] = D[Q, P]. 


e Given a joint probability distribution P4g on two identical alphabets A = B, the distance 
between the marginal distributions is bounded by D[P.4, Pg] < Pag(a £ b). Indeed, 


1 
D[Pa,Pel = 5) |P(a= $) - PE = 8) 


È 
25 EP. - E Pat) 
E [oz azé 


<5 (EEr EEren 


E bE E até 
=P(a#b) 


where in the starred equality we have used P(a = £) = P(€,&) + 2 oze P(é,6) and P(b = £) = 
PEE) + Vage PCa, E). 


e For bits A = {0,1}, it holds D[P 4, P4] = 2D[P4, Q4], where Py is defined by P(a) = 1 — 
P(a) = P(1 — a) and Q4 (a) = 5. Indeed, 


2 1 
DIPA PA] = py alae —a =£) — P(a = £)| 
E 


1 


=32_ |1- 2P(a=$)| 
E 


1 1 
= D =2D[P4 Q4]. 


G.8 Information Causality from Desiderata 
on Information Entropies 


We have mentioned in subsection 10.2.3 that the condition (10.3) for Information Causality (IC) 
holds under basic requirements on the information entropy of the underlying theory. For the 
convenience of the reader, we reproduce here the proof of (Al-Safi and Short, 2011) with our 
notation and some additional comments. 
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The two requirements on the information entropy H of the theory are: 


1. Consistency with classical information theory: If A is a resource that can be described in 
classical theory, then 


H(A) = H,(A) (G.27) 


where H, the Shannon entropy for classical variables. 


2. A form of the data-processing inequality: If B —> B’ denotes a transformation on Bob’s 
system alone, it must hold 


H(A\B’) >H(AJB) (G.28) 


where H(A|B) = H(A, B) — H(B). Using this expression, it is useful to consider the case 
where By consists in simply discarding the system: In this case, (G.28) becomes H(A) > 
H(A, B) — H(B), when subadditivity follows: 


H(A, B) <H(A)+ H(B). (G.29) 


Now we can present the proof. The resource of Bob is denoted by pg with a hint to the quantum 
formalizm: But this is purely a notation, quantum theory is not assumed here. 

Since Bob’s guess £, is obtained from processing his resource pg and Alice’s message m € M = 
{0, 1}*, it holds 


(G.28) 
H,(Xy|B,) > H(Xylop,M). (G.30) 


Notice that there is no distribution on pp, since this is the resource the players are using; but it 
might have an entropy, and this is why on the r.h.s. we have to work with the information entropy 
of the theory. Now we denote x= (Xo,...,XN_—1) the collection of all Alice’s inputs. The following 
bounds hold 


(G.29) 3 3 
DSH (&ylon,M) > H(X\pp,M) = HX, pp,M) — H(p8,M) 
y 


(G.29 > 
> H(X,pg,M)- H(pg) — H.M) 


~ H(X, pp,M) — H(X, pB) + H-(X) — H. (M) 
= H(MIX, pp) + H-(X) — H. (M) 
> H(X) -« 


where for the step * we used the fact that Bob’s resource must be uncorrelated with Alice’s inputs; 
for the last step, we used the facts that the conditional entropy is positive and that H,(M) < k 
because Alice sends those many bits. 

This bound is already a form of IC that does not assume the distribution of Alice’s inputs. If we 
add that her inputs must be uncorrelated, as we assumed in the main text, then H,(X) = Sy H.(Xy) 
and we recover (10.3). 
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Fine’s theorem 30 
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Gisin’s theorem 50 
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consequences 122, 131, 
137, 139 
Study of its properties 120 
Process 
Definition 5, 26 
Deterministic 26 
Local 29 
Local deterministic 29 
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